Planning as
Heuristic Forward Search
Brian C. Williams
March31, 2004
16.412J/6.834J
Slides with help from:
Prof. Maria Fox
Readings in Planning as
Forward Heuristic Search
? “The FF Planning System: Fast Plan
Generation Through Heuristic Search,” by
Jorg Hoffmann and Bernhard Nebel,
Journal of Artificial Intelligence Research,
2001.
? “Planning as Heuristic Search,” by Blai
Bonet and Hector Geffner, Artificial
Intelligence Journal, 2001.
Outline
? Introduction to FF
? FF Search Algorithm
? FF Heuristic Fn
? FF Example
? Appendix: HSP
Example: Polish
Move from room x to room y
? pre: robot is in x, door open
? add: robot is in y
? del: robot is in x
Open door
? pre: door is closed
? add: door is open
? del: door is closed
Close door
? pre: door is open
? add: door is closed
? del: door is open
Polish from room x to room y
? pre: door open
? add: floor polished
Initial State
? In(A)
? Closed
Final State
? In(B)
? Closed
? Polished
Planning as
Forward Heuristic Search
? Planning can be seen as a state space search,
for a path from the initial state to a goal state.
? Planning research has largely not been
concerned with finding optimal solutions.
? Although heuristic preference to shorter plans.
? Planning research has largely used incomplete
or uninformed search methods.
? Breadth first search
? Meta search rules
?The size of most state spaces requires
informative heuristics to guide the search.
Review: Search Strategies
? Breadth first search (Uninformed)
? systematic search of state space in layers.
? A* search (Informed)
? Expands search node with best estimated cost.
? Estimated cost = cost-so-far + optimistic-cost-to-go
? Greedy search
? Expands search node closest to the goal according to a heuristic
function.
? Hill-climbing search
? Move towards goal by random selection from the best children.
? To apply informed search to planning need heuristic fn
Fast Forward (FF)
? Forward-chaining heuristic search planner
? Basic principle: Hill-climb through the space of
problem states, starting at the initial state.
? Each child state results from apply a single plan
operator.
? Always moves to the first child state found that is
closer to the goal.
? Records the operators applied along the path.
?The operators leading to the goal constitute a plan.
Outline
? Introduction to FF
? FF Search Algorithm
? FF Heuristic Fn
? FF Example
? Appendix: HSP
Planning Problem and
State Space
? A planning problem is a tuple <P, A, I, G>:
? Propositions P
? Ground actions A are instantiated operators
? Initial state I is a subset of P, and
? Goal state G is a subset of P.
? The state space of a problem consists of all
subsets of propositions P.
? A transition between two states is any valid
application of an action, that is, its preconditions
are satisfied.
FF Search Strategy
FF uses a strategy called enforced
hill-climbing:
? Obtain heuristic estimate of the value of
the current state.
? Find action(s) transitioning to a better state.
? Move to the better state.
? Append actions to plan head.
? Never backtrack over any choice.
Init
h(init)
S1
h(S1)
S2
h(S2)
S3
h(S3)
S4
h(S4)
S5
h(S5)
S6
h(S6)
h(S1) < h(S4) <h(init) < h(S2) < h(S3) < h(S5) = h(S6)
A
B
Plan Head: AA, B
Maximize Utility h
S10
h(S10)
S11
h(S11)
S12
h(S12)
Finding a better state: Plateaus
Perform breadth first search from the current state,
to states reachable by action applications,
stopping as soon as a strictly better one is found.
S7
h(S7)
S8
h(S8)
S9
h(S9)
C
S6
h(S6)
D
h(S7) < h(S6) = h(S8) . . . = h(S10) < h(S11) < h(S12)
Enforced Hill-Climbing
(cont.)
? The success of this strategy depends on how
informative the heuristic is.
? FF uses a heuristic found to be informative in a large
class of bench mark planning domains.
? The strategy is not complete.
? Never backtracking means that some parts of the
search space are lost.
? If FF fails to find a solution using this strategy it
switches to standard best-first search.
? (e. g., Greedy or A* search).
Outline
? Introduction to FF
? FF Search Algorithm
? FF Heuristic Fn
? FF Example
? Appendix: HSP
FF’s Heuristic Estimate
? The value of a state is a measure of how close it
is to a goal state.
? This cannot be determined exactly (too hard),
but can be approximated.
? One way of approximating is to solve a
relaxed problem.
? Relaxation is achieved by ignoring the negative
effects of the actions.
? The relaxed action set, A', is defined by:
A' = {<pre(a),add(a),0> | a in A}
Distance Estimate Extracted
From A Relaxed Plan Graph
? Layers correspond to successive time points,
? # layers indicate minimum time to achieve goals.
In(A)
Closed
Layer 1
? Current: In(A), Closed Goal: In(B)
Open
noop
noop
In(A)
Closed
Opened
Layer 2
In(B)
noop
noop
Move
Open
noop
Close
In(A)
Closed
Opened
Layer 3
Building the Relaxed Plan
Graph
? Start at the initial state.
? Repeatedly apply all relaxed actions whose
preconditions are satisfied.
? Assert their (positive) effects in the next layer.
? If all actions are applied and the goals are not
all present in the final graph layer,
Then the problem is unsolvable.
Distance Estimate Extracted
From A Relaxed Plan Graph
? Layers correspond to successive time points,
? # layers indicate minimum time to achieve goals.
In(A)
Closed
Layer 1
? Current: In(A), Closed Goal: In(B)
Open
noop
noop
In(A)
Closed
Opened
Layer 2
In(B)
noop
noop
Move
Open
noop
Close
In(A)
Closed
Opened
Layer 3
Extracting a Relaxed Soln
? When a layer containing all of the goals is
reached, FF searches backwards for a plan.
? The first possible achiever found is always used
to achieve each goal.
? This maximizes the possibility for exploiting actions in
the relaxed plan.
? The relaxed plan might contain many actions
happening concurrently at a layer.
? The number of actions in the relaxed plan is an
estimate of the true cost of achieving the goals.
Distance Estimate Extracted
From A Relaxed Plan Graph
? Layers correspond to successive time points,
? # layers indicate minimum time to achieve goals.
In(A)
Closed
Layer 1
? Current: In(A), Closed Goal: In(B)
Open
noop
noop
In(A)
Closed
Opened
Layer 2
In(B)
noop
noop
Move
Open
noop
Close
In(A)
Closed
Opened
Layer 3
How FF Uses the Heuristic
? FF uses the heuristic to estimate how close
each state is to a goal state
? any state satisfying the goal propositions.
? The actions in the relaxed plan are used as a
guide to which actions to explore when
extending the plan.
? All actions in the relaxed plan at the 1st layer that
achieves at least one of the (sub)goals required at
the 2
nd
layer are considered helpful.
?FF restricts attention to the helpful actions when
searching forward from a state.
Distance Estimate Extracted
From A Relaxed Plan Graph
? Useful actions: Open
In(A)
Closed
Layer 1
? Current: In(A), Closed Goal: In(B)
Open
noop
noop
In(A)
Closed
Opened
Layer 2
In(B)
noop
noop
Move
Open
noop
Close
In(A)
Closed
Opened
Layer 3
Properties of the Heuristic
? The relaxed plan that is extracted is not
guaranteed to be the optimal relaxed plan.
? the heuristic is not admissible.
? FF can produce non-optimal solutions.
?Focusing only on helpful actions is not
completeness preserving.
?Enforced hill-climbing is not completeness
preserving.
Getting Out of Deadends
? Because FF does not backtrack,
FF can get stuck in dead-ends.
? This arises when an action cannot be reversed,
thus, having entered a bad state there is no way to
improve.
? When no search progress can be made,
FF switches to Best First Search from the initial
state.
? Detecting a dead-end can be expensive if the plateau
is large.
Outline
? Introduction to FF
? FF Search Algorithm
? FF Heuristic Fn
? FF Example
? Appendix: HSP
Example: Polish
Move from room x to room y
? pre: robot is in x, door open
? add: robot is in y
? del: robot is in x
Open door
? pre: door is closed
? add: door is open
? del: door is closed
Close door
? pre: door is open
? add: door is closed
? del: door is open
Polish from room x to room y
? pre: door open
? add: floor polished
Initial State
? In(A)
? Closed
Final State
? In(B)
? Closed
? Polished
In(A)
In(B)
Closed
Opened
Polished
noop
noop
Polish
Move(A,B)
Close
noop
In(A)
Closed
Opened
noop
noop
Open
Estimate ?In(A)
Closed
Node 1
In(A)
Closed
Relaxed Graph
Goal: In(B), Closed, Polished
Note: For simplicity, operator only drawn
in first layer in which it is introduced.
Estimate: 3
? # actions in plan
Useful Actions: Open
? actions used in 1
st
layer, used to create children
Estimate 3In(A)
Closed
Node 1
In(A)
Opened
Relaxed Graph
Goal: In(B), Closed, Polished
Estimate: 3 = Plateau
perform BFS
Useful Actions:
Move, Close, Polish
Estimate ?In(A)
Opened
Node 2
Open
In(A)
In(B)
Closed
Opened
Polished
noop
noop
Polish
Move(A,B)
Close
Estimate 3In(A)
Closed
Node 1
In(B)
Opened
Relaxed Graph
Goal: In(B), Closed, Polished
Estimate: 2 = Off Plateau
Useful Actions:
Close, Polish
Estimate 3In(A)
Opened
Node 2
Open
In(B)
In(A)
Closed
Opened
Polished
noop
noop
Polish
Move(B,A)
Close
Estimate ?In(B)
Opened
Node 3
Move(A,B)
Close
Polish
Estimate 3In(A)
Closed
Node 1
Relaxed Graph
Goal: In(B), Closed, Polished
Estimate: 2
No improvement
Estimate 3In(A)
Opened
Node 2
Open
Estimate 2In(B)
Opened
Node 3
Move(A,B)
Close
Polish
Close Polish
In(B)
Closed
Node 4
In(B)
Closed
In(B)
Closed
Opened
noop
noop
Open
In(B)
In(A)
Closed
Opened
Polished
noop
noop
noop
Polish
Move(B,A)
Close
Estimate 3In(A)
Closed
Node 1
Goal: In(B), Closed, Polished
Estimate: 1
Useful Actions:
Close
Estimate 3In(A)
Opened
Node 2
Open
Estimate 2In(B)
Opened
Node 3
Move(A,B)
Close
Polish
Close Polish
In(B)
Closed
Node 4
Relaxed Graph
In(B)
Opened
Polished
In(B)
In(A)
Opened
Closed
Polished
noop
noop
noop
Move(B,A)
Close
In(B)
Opened
Polished
Node 5
Estimate 2 Estimate ?
Estimate 3In(A)
Closed
Node 1
Goal: In(B), Closed, Polished
Estimate 3In(A)
Opened
Node 2
Open
Estimate 2In(B)
Opened
Node 3
Move(A,B)
Close
Polish
Close Polish
In(B)
Closed
Node 4
Estimate: 0
Relaxed Graph
In(B)
Closed
Polished
In(B)
Opened
Polished
Node 5
Estimate 2 Estimate 1
Close
In(B)
Closed
Polished
Estimate ?
Plan: Open, Move(A,B), Polish, Close
Fast Forward (FF)
? Forward-chaining heuristic search planner
? Basic principle: Hill-climb through the space of
problem states, starting at the initial state.
? Each child state applies a single plan operator.
? Always moves to the first child state found that is
closer to the goal.
? Record the transitions applied along the path.
? The transitions leading to the goal constitute a plan.
Outline
? Introduction to FF
? FF Search Algorithm
? FF Heuristic Fn
? FF Example
? Appendix: HSP
Other Distance Estimates
? Distance to the goal can be estimated without
building a relaxed reachability analysis, and
then extracting a relaxed plan.
? An alternative is to estimate the cost of
achieving a goal, as the cost of achieving the
preconditions of a suitable action, plus one.
? The cost of achieving any goal, not already true,
can be initialized to infinity, and then updated in
every state.
HSP
? Developed by Blai Bonet and Hector Geffner at
Simon Bolivar University, Venezuela.
? Hill-climbing search based on random selection
from the set of the best successor states.
? Heuristic evaluation - Add: estimate the relaxed
distance to the goal set using the following
equation:
g
s
+ (C) = Sum
(q in C)
g
s
(q)
? That is: the cost of achieving a collection of atoms, C,
from state s, is the sum of the cost of achieving each
of those atoms from state s.
Search Strategy of HSP
? Starting at initial state, move forward by
hill-climbing using the additive heuristic
measure to evaluate states.
? Next state visited is one strictly closer to
the goal - ties broken randomly.
? A threshold is used to limit time spent
exploring plateaus and search is restarted
if threshold is exceeded.
Properties of Heuristics
? The add heuristic g
s
(C) pessimistically assumes
that the individual atoms have to be achieved
independently.
? This means g
s
(C) is not admissible because it
will often over-estimate distance to a goal.
? An alternative, admissible heuristic is:
g
s
(C) = max(q in C) g
s
(q)
? This is admissible but uninformative
? greatly under-estimates distances
? little use in practice.
HSP Heuristic Estimate
? The cost of achieving a single atom from a state
s is given by evaluating:
g
s
(q) = min(g
s
(q), 1+g
s
(Prec(op))
? That is: the minimum of the currently known cost (if q
is already asserted in s) and the cost of achieving the
preconditions of an action, op, that would assert q in
the next state.
? g
s
(q) gets initialized to infinity (or 0 if it is true in
the initial state) and is updated in every state.
Differences between
FF and HSP
? FF uses a heuristic evaluation based on the
number of actions in an explicit relaxed plan.
? HSP uses weight values which approximate
(but over-estimate) the length of a relaxed plan.
? FF uses the relaxed plan to prune action
choices.
? FF terminates BFS for a successor state as
soon as an improvement is found. HSP selects
successors randomly from the set of best states.
? FF defaults to complete BFS from initial state if
the enforced hill-climbing strategy fails.
Conclusions
? FF and HSP are forward chaining planners that
use a hill-climbing strategy based on relaxed
distance estimates.
? FF uses a number of pruning heuristics that can
be very powerful (especially in the simply-
structured propositional bench mark domains).
? HSP has been extended in a number of ways to
exploit alternative search strategies.
? HSP is better suited to some problems than FF,
but FF has overall better performance.