Planning as Heuristic Forward Search Brian C. Williams March31, 2004 16.412J/6.834J Slides with help from: Prof. Maria Fox Readings in Planning as Forward Heuristic Search ? “The FF Planning System: Fast Plan Generation Through Heuristic Search,” by Jorg Hoffmann and Bernhard Nebel, Journal of Artificial Intelligence Research, 2001. ? “Planning as Heuristic Search,” by Blai Bonet and Hector Geffner, Artificial Intelligence Journal, 2001. Outline ? Introduction to FF ? FF Search Algorithm ? FF Heuristic Fn ? FF Example ? Appendix: HSP Example: Polish Move from room x to room y ? pre: robot is in x, door open ? add: robot is in y ? del: robot is in x Open door ? pre: door is closed ? add: door is open ? del: door is closed Close door ? pre: door is open ? add: door is closed ? del: door is open Polish from room x to room y ? pre: door open ? add: floor polished Initial State ? In(A) ? Closed Final State ? In(B) ? Closed ? Polished Planning as Forward Heuristic Search ? Planning can be seen as a state space search, for a path from the initial state to a goal state. ? Planning research has largely not been concerned with finding optimal solutions. ? Although heuristic preference to shorter plans. ? Planning research has largely used incomplete or uninformed search methods. ? Breadth first search ? Meta search rules ?The size of most state spaces requires informative heuristics to guide the search. Review: Search Strategies ? Breadth first search (Uninformed) ? systematic search of state space in layers. ? A* search (Informed) ? Expands search node with best estimated cost. ? Estimated cost = cost-so-far + optimistic-cost-to-go ? Greedy search ? Expands search node closest to the goal according to a heuristic function. ? Hill-climbing search ? Move towards goal by random selection from the best children. ? To apply informed search to planning need heuristic fn Fast Forward (FF) ? Forward-chaining heuristic search planner ? Basic principle: Hill-climb through the space of problem states, starting at the initial state. ? Each child state results from apply a single plan operator. ? Always moves to the first child state found that is closer to the goal. ? Records the operators applied along the path. ?The operators leading to the goal constitute a plan. Outline ? Introduction to FF ? FF Search Algorithm ? FF Heuristic Fn ? FF Example ? Appendix: HSP Planning Problem and State Space ? A planning problem is a tuple <P, A, I, G>: ? Propositions P ? Ground actions A are instantiated operators ? Initial state I is a subset of P, and ? Goal state G is a subset of P. ? The state space of a problem consists of all subsets of propositions P. ? A transition between two states is any valid application of an action, that is, its preconditions are satisfied. FF Search Strategy FF uses a strategy called enforced hill-climbing: ? Obtain heuristic estimate of the value of the current state. ? Find action(s) transitioning to a better state. ? Move to the better state. ? Append actions to plan head. ? Never backtrack over any choice. Init h(init) S1 h(S1) S2 h(S2) S3 h(S3) S4 h(S4) S5 h(S5) S6 h(S6) h(S1) < h(S4) <h(init) < h(S2) < h(S3) < h(S5) = h(S6) A B Plan Head: AA, B Maximize Utility h S10 h(S10) S11 h(S11) S12 h(S12) Finding a better state: Plateaus Perform breadth first search from the current state, to states reachable by action applications, stopping as soon as a strictly better one is found. S7 h(S7) S8 h(S8) S9 h(S9) C S6 h(S6) D h(S7) < h(S6) = h(S8) . . . = h(S10) < h(S11) < h(S12) Enforced Hill-Climbing (cont.) ? The success of this strategy depends on how informative the heuristic is. ? FF uses a heuristic found to be informative in a large class of bench mark planning domains. ? The strategy is not complete. ? Never backtracking means that some parts of the search space are lost. ? If FF fails to find a solution using this strategy it switches to standard best-first search. ? (e. g., Greedy or A* search). Outline ? Introduction to FF ? FF Search Algorithm ? FF Heuristic Fn ? FF Example ? Appendix: HSP FF’s Heuristic Estimate ? The value of a state is a measure of how close it is to a goal state. ? This cannot be determined exactly (too hard), but can be approximated. ? One way of approximating is to solve a relaxed problem. ? Relaxation is achieved by ignoring the negative effects of the actions. ? The relaxed action set, A', is defined by: A' = {<pre(a),add(a),0> | a in A} Distance Estimate Extracted From A Relaxed Plan Graph ? Layers correspond to successive time points, ? # layers indicate minimum time to achieve goals. In(A) Closed Layer 1 ? Current: In(A), Closed Goal: In(B) Open noop noop In(A) Closed Opened Layer 2 In(B) noop noop Move Open noop Close In(A) Closed Opened Layer 3 Building the Relaxed Plan Graph ? Start at the initial state. ? Repeatedly apply all relaxed actions whose preconditions are satisfied. ? Assert their (positive) effects in the next layer. ? If all actions are applied and the goals are not all present in the final graph layer, Then the problem is unsolvable. Distance Estimate Extracted From A Relaxed Plan Graph ? Layers correspond to successive time points, ? # layers indicate minimum time to achieve goals. In(A) Closed Layer 1 ? Current: In(A), Closed Goal: In(B) Open noop noop In(A) Closed Opened Layer 2 In(B) noop noop Move Open noop Close In(A) Closed Opened Layer 3 Extracting a Relaxed Soln ? When a layer containing all of the goals is reached, FF searches backwards for a plan. ? The first possible achiever found is always used to achieve each goal. ? This maximizes the possibility for exploiting actions in the relaxed plan. ? The relaxed plan might contain many actions happening concurrently at a layer. ? The number of actions in the relaxed plan is an estimate of the true cost of achieving the goals. Distance Estimate Extracted From A Relaxed Plan Graph ? Layers correspond to successive time points, ? # layers indicate minimum time to achieve goals. In(A) Closed Layer 1 ? Current: In(A), Closed Goal: In(B) Open noop noop In(A) Closed Opened Layer 2 In(B) noop noop Move Open noop Close In(A) Closed Opened Layer 3 How FF Uses the Heuristic ? FF uses the heuristic to estimate how close each state is to a goal state ? any state satisfying the goal propositions. ? The actions in the relaxed plan are used as a guide to which actions to explore when extending the plan. ? All actions in the relaxed plan at the 1st layer that achieves at least one of the (sub)goals required at the 2 nd layer are considered helpful. ?FF restricts attention to the helpful actions when searching forward from a state. Distance Estimate Extracted From A Relaxed Plan Graph ? Useful actions: Open In(A) Closed Layer 1 ? Current: In(A), Closed Goal: In(B) Open noop noop In(A) Closed Opened Layer 2 In(B) noop noop Move Open noop Close In(A) Closed Opened Layer 3 Properties of the Heuristic ? The relaxed plan that is extracted is not guaranteed to be the optimal relaxed plan. ? the heuristic is not admissible. ? FF can produce non-optimal solutions. ?Focusing only on helpful actions is not completeness preserving. ?Enforced hill-climbing is not completeness preserving. Getting Out of Deadends ? Because FF does not backtrack, FF can get stuck in dead-ends. ? This arises when an action cannot be reversed, thus, having entered a bad state there is no way to improve. ? When no search progress can be made, FF switches to Best First Search from the initial state. ? Detecting a dead-end can be expensive if the plateau is large. Outline ? Introduction to FF ? FF Search Algorithm ? FF Heuristic Fn ? FF Example ? Appendix: HSP Example: Polish Move from room x to room y ? pre: robot is in x, door open ? add: robot is in y ? del: robot is in x Open door ? pre: door is closed ? add: door is open ? del: door is closed Close door ? pre: door is open ? add: door is closed ? del: door is open Polish from room x to room y ? pre: door open ? add: floor polished Initial State ? In(A) ? Closed Final State ? In(B) ? Closed ? Polished In(A) In(B) Closed Opened Polished noop noop Polish Move(A,B) Close noop In(A) Closed Opened noop noop Open Estimate ?In(A) Closed Node 1 In(A) Closed Relaxed Graph Goal: In(B), Closed, Polished Note: For simplicity, operator only drawn in first layer in which it is introduced. Estimate: 3 ? # actions in plan Useful Actions: Open ? actions used in 1 st layer, used to create children Estimate 3In(A) Closed Node 1 In(A) Opened Relaxed Graph Goal: In(B), Closed, Polished Estimate: 3 = Plateau perform BFS Useful Actions: Move, Close, Polish Estimate ?In(A) Opened Node 2 Open In(A) In(B) Closed Opened Polished noop noop Polish Move(A,B) Close Estimate 3In(A) Closed Node 1 In(B) Opened Relaxed Graph Goal: In(B), Closed, Polished Estimate: 2 = Off Plateau Useful Actions: Close, Polish Estimate 3In(A) Opened Node 2 Open In(B) In(A) Closed Opened Polished noop noop Polish Move(B,A) Close Estimate ?In(B) Opened Node 3 Move(A,B) Close Polish Estimate 3In(A) Closed Node 1 Relaxed Graph Goal: In(B), Closed, Polished Estimate: 2 No improvement Estimate 3In(A) Opened Node 2 Open Estimate 2In(B) Opened Node 3 Move(A,B) Close Polish Close Polish In(B) Closed Node 4 In(B) Closed In(B) Closed Opened noop noop Open In(B) In(A) Closed Opened Polished noop noop noop Polish Move(B,A) Close Estimate 3In(A) Closed Node 1 Goal: In(B), Closed, Polished Estimate: 1 Useful Actions: Close Estimate 3In(A) Opened Node 2 Open Estimate 2In(B) Opened Node 3 Move(A,B) Close Polish Close Polish In(B) Closed Node 4 Relaxed Graph In(B) Opened Polished In(B) In(A) Opened Closed Polished noop noop noop Move(B,A) Close In(B) Opened Polished Node 5 Estimate 2 Estimate ? Estimate 3In(A) Closed Node 1 Goal: In(B), Closed, Polished Estimate 3In(A) Opened Node 2 Open Estimate 2In(B) Opened Node 3 Move(A,B) Close Polish Close Polish In(B) Closed Node 4 Estimate: 0 Relaxed Graph In(B) Closed Polished In(B) Opened Polished Node 5 Estimate 2 Estimate 1 Close In(B) Closed Polished Estimate ? Plan: Open, Move(A,B), Polish, Close Fast Forward (FF) ? Forward-chaining heuristic search planner ? Basic principle: Hill-climb through the space of problem states, starting at the initial state. ? Each child state applies a single plan operator. ? Always moves to the first child state found that is closer to the goal. ? Record the transitions applied along the path. ? The transitions leading to the goal constitute a plan. Outline ? Introduction to FF ? FF Search Algorithm ? FF Heuristic Fn ? FF Example ? Appendix: HSP Other Distance Estimates ? Distance to the goal can be estimated without building a relaxed reachability analysis, and then extracting a relaxed plan. ? An alternative is to estimate the cost of achieving a goal, as the cost of achieving the preconditions of a suitable action, plus one. ? The cost of achieving any goal, not already true, can be initialized to infinity, and then updated in every state. HSP ? Developed by Blai Bonet and Hector Geffner at Simon Bolivar University, Venezuela. ? Hill-climbing search based on random selection from the set of the best successor states. ? Heuristic evaluation - Add: estimate the relaxed distance to the goal set using the following equation: g s + (C) = Sum (q in C) g s (q) ? That is: the cost of achieving a collection of atoms, C, from state s, is the sum of the cost of achieving each of those atoms from state s. Search Strategy of HSP ? Starting at initial state, move forward by hill-climbing using the additive heuristic measure to evaluate states. ? Next state visited is one strictly closer to the goal - ties broken randomly. ? A threshold is used to limit time spent exploring plateaus and search is restarted if threshold is exceeded. Properties of Heuristics ? The add heuristic g s (C) pessimistically assumes that the individual atoms have to be achieved independently. ? This means g s (C) is not admissible because it will often over-estimate distance to a goal. ? An alternative, admissible heuristic is: g s (C) = max(q in C) g s (q) ? This is admissible but uninformative ? greatly under-estimates distances ? little use in practice. HSP Heuristic Estimate ? The cost of achieving a single atom from a state s is given by evaluating: g s (q) = min(g s (q), 1+g s (Prec(op)) ? That is: the minimum of the currently known cost (if q is already asserted in s) and the cost of achieving the preconditions of an action, op, that would assert q in the next state. ? g s (q) gets initialized to infinity (or 0 if it is true in the initial state) and is updated in every state. Differences between FF and HSP ? FF uses a heuristic evaluation based on the number of actions in an explicit relaxed plan. ? HSP uses weight values which approximate (but over-estimate) the length of a relaxed plan. ? FF uses the relaxed plan to prune action choices. ? FF terminates BFS for a successor state as soon as an improvement is found. HSP selects successors randomly from the set of best states. ? FF defaults to complete BFS from initial state if the enforced hill-climbing strategy fails. Conclusions ? FF and HSP are forward chaining planners that use a hill-climbing strategy based on relaxed distance estimates. ? FF uses a number of pruning heuristics that can be very powerful (especially in the simply- structured propositional bench mark domains). ? HSP has been extended in a number of ways to exploit alternative search strategies. ? HSP is better suited to some problems than FF, but FF has overall better performance.