Di erent from existing reinforcement learning algorithms that generate only reactive policies and existing probabilis tic planning algorithms that requires a substantial amount of a priori knowledge in order to plan we devise a two stage bottom up learning to plan process in which rst reinforce ment learning dynamic programming is applied without the use of a priori domain speci c knowledge to acquire a reactive policy and then explicit plans are extracted from the learned reactive policy Plan e…
Read moreDi erent from existing reinforcement learning algorithms that generate only reactive policies and existing probabilis tic planning algorithms that requires a substantial amount of a priori knowledge in order to plan we devise a two stage bottom up learning to plan process in which rst reinforce ment learning dynamic programming is applied without the use of a priori domain speci c knowledge to acquire a reactive policy and then explicit plans are extracted from the learned reactive policy Plan extraction is based on a beam search algorithm that performs temporal projection in a restricted fashion guided by the value functions re sulting from reinforcement learning dynamic programming Experiments and theoretical analysis are presented..