•  10
    Recurrent policy gradients
    with Daan Wierstra, Jan Peters, and Jürgen Schmidhuber
    Logic Journal of the IGPL 18 (5): 620-634. 2010.
    Reinforcement learning for partially observable Markov decision problems is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be…Read more