PowerPoint Presentation
Classical Planning
Non-Forward Search
Copyright By Assignmentchef assignmentchef
Planning as SAT
6CCS3AIP Artificial Intelligence Planning
Dr Tommy Thompson
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Hello and welcome to this chapter as part of the non-forward search segment of the module.In this video were going to discuss another approach of planning that deviates from the traditional forward search and that is planning as a SAT or Boolean satisfiability problem.
Planning Formalisms
Throughout this module, weve stuck to some key concepts of how planning is formulated and explored.
Set-Theoretic Representation (PDDL)
State Transition Model
Forward Search
But there have been alternatives to how we explore a planning problem.
Idea: What if we transform a planning problem into another problem type for which they are already efficient solvers to address it?
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Throughout the module were running on the primary notion that we have a set-theoretic representation of the world with states comprised of predicates that give us snapshot views of the world.Plus we then have our state transition model that shows us how states transform based on actions performed on them and we rely on these principles to then drive us through the state space through a forward search algorithm such as breadth-first, depth-first, best-first, A*, EHC, and so on.
But there are alternatives to this approach.We already have the likes of RPG and of course GraphPlan from which it was devised, and we have other variations of planning in other chapters of this segment of the module such as PoP and HTN planning.But were actually overlapping with something that is covered in the segment on SAS+ planning, where if we re-encode the problem into another formulation, can we then solve it that way?
Boolean Satisfiability Problems (SAT)
Encoding a problem in such a way that there exists a valid interpretation.
Create a Boolean expression comprised of variables with operators applied to them.
If we can find a valid combination of values for this expression, we can consider that SAT problem to be satisfiable.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Boolean Satisfability problems, often also referred to as propositional satisfiability problems which I will now refer to by their abbreviation SAT are a form of deductive reasoning whereby we attempt to encode a problem into a more succinct expression in this case it is as the name implies a boolean expression (or propositional logic formula) .This means we have a series of variables with operators applied to them and we aim to a value to specific variables such that the overall function can return as true.
In the event the function can return as true, we consider the SAT to be satisfiable, given there is a valid configuration of variable values.However, in the event that is not possible, then we consider the function to be unsatisfiable.
So, Ive provided some very trivial examples here on the right hand side.The first, F = A and not B is easily satisfiable.Given if we make A true and B false, then that conjunction means that F will be true.
However, the second SAT problem is not satisfiable.Why is this?Well its asking for a valid combination where A and not A return as true.This is simply impossible.Since if A is true, not A is false and vice versa.This SAT will always return as false.
Now for those of you who studied introduction to artificial intelligence, you will remember the idea of solving constraint satisfaction problems or CSPs.The idea we have tight constraints on the configuration of variables such that we need to find a valid configuration that satisfies our criteria.SAT solving is one form of CSP research and as mentioned in that class, it is possible for a planning problem to be formulated as a constraint satisfaction problem and thats really what were focussing on here.
Why Use SAT for Planning?
SAT solving is a broad and rich research domain in and of itself.
Typically used in software verification, theorem proving, model checking, pattern generation etc.
There are existing SAT solvers established within this community that can be adopted.
So how do we reformulate a planning problem into a SAT problem?
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So why would you then use SAT as means to solve a planning problem?Well the main reason is that they have significant overlaps.In both cases youre trying to find assignments of variables that yield a positive/true outcome over a given function.In the context of planning, were trying to find a valid configuration of variables at different stages of the planning process such that it yields a true outcome, meaning that there is a combination of states, when put in the right order satisfy the constraints of the actions in the planning domain and enable us to transition from the initial state of the problem to the goal.
And SAT is a fully formed and heavily explored area of computer science in its own right.There are much like planning a variety of different SAT solvers that are approaching the problem in different ways.And it is a problem area used in situations such as software verification and model checking.SAT solving is a research field in and of itself, so why not try and exploit it.
But now that I have hopefully convinced you its a good idea, lets start to look at how we formulate the planning problem as a SAT problem.
Planning as a SAT Problem
Problem: Our planning formulation
Idea: Propositional variables for each state variable.
Create unit clauses that specify
The initial state.
The goal state.
Create clauses to describe how variables can change between two time pointsand .
Solution: Create a SAT formulasuch that
Ifis satisfiable, then a plan exists for the planning problem
Every possible solution forresults in a different valid plan for
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Okay, so how are new going to actually do this?How does it work.Well if we have this planning problem P, which is comprised of the state transition system sigma, the initial state s-I and the goal state s-g, were going to create
Clauses that express the variables that exist in the initial state s-I as well as what variables are assigned in the goal state.
After that, we also need to have clauses that describe how the variables can possibly change, this is of course are actions.Denoting how a given variable will change between two time points t and t+1.
Once we have these clauses, the trick is to find a SAT formula phiand we then aim to satisfy it.
If we have satisfy phi, then we know that a plan exists for the planning problem P.
But also, every possible satisfiable solution to phi will when processed back into a planning problem provide us with a different valid plan for the original problem.
SAT Encoding: Variables
State Variables
A set of propositional variablesthat encode the state aftersteps of the plan.
There is a finite number of steps of the plan, encoding as : the planning horizon
Action Variables
A set of propositional variablesthat encode the actions applied at thestep of the plan.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So were going to need to find a way to encode the information from each of the states as well as the actions, the clauses we mentioned on the previous slide.
We will a set of propositional variables V, that for each variable, denoted v-I, is encoding a piece of information about the current state at a specific stage of the plan, i.
In order for this to work, we have to assume a finite number of steps of the plan, which is up to a limit T.This limit is our planning horizon.Were going to gradually increment the horizon over time enabling for the planning process to actually occur.
Meanwhile, we will also provide a separate set of variables that represent the actions denoted A which will show actions that are applied at a particular step of the plan, again denoted as i.
Now its important to denote the ranges within which these exist.The state encodings run from 0 up until T, which allows us to show v-0, which is the initial state up until V-T which is the most recent state as a result of the most recent action.However, the actions are only from 1 to T, given a-1 is the first action set from v-0, while a-T is the last action set before v-T.So yeah, if you have a planning horizon of 5, then you will have encode 6 state sets and 5 action sets.
State Variables
Our planning formulation
Initial State
Unit clauses encoding the initial state.
Goal State
Unit clauses encoding the goal state.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Now if we consider this in the context of our planning formulation, were going to in the case of the initial state create propositional variables for all of the facts that are true at timestep T in the initial state, but also all of the facts that are not true in the initial state.This is important, given were not relying on the closed world assumption here, were making sure to expressly communicate whether facts are true or false at this particular timestep.
Then for the goal state, were going to express what facts are true at the horizon point T.Here like usual were only interested in the facts that are critical to the goal state.No extraneous facts are required for this formalism.
Now you may have already observed that if were trying to encode information at horizon point T, how do we know where that is?Essentially T is going to be an estimate of goal distance is it not?After all, if we have a plan of 5 actions, then it the goal state is only satisfiable at a horizon of 5 right?Unless we consider concurrent actions on a given time point.
Well, as were going to see later in this chapter, were going to use an iterative deepening search to find it.So we will set T to be 1 to start with and then see if the SAT is satisfiable from that point.But were going to get to that in a bit, we still have a lot of ground to cover still before we look at whether or not this is solvable.
State Variables (Example)
Simple Gripper domain problem.
Two locations, one box to move from A to B.
Encoding for T = 1.
Initial State
Goal State
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So if we consider a simple example, where we have the Gripper domain, with one robot, two free grippers and its trying to move boxes from one room to another.I have set this up so we only have one box just to make it nice and easy for us to process.
In this instance, the initial state is that the robot r is at location A, the box b1 is at location A, and that the left and right grippers are free.
For the SAT encoding at T = 1, we encode the initial state as all of the facts that are true at timestep 0, and we denote the timestep in each of the these statements.So in this case, were show in the middle of the slide that r is at loc A, bt is at locA and the left and right are free at time step 0.But also, that r is not in loc B and that b1 is not in loc B.
Meanwhile, when we then look to encode the goal, which is for b1 to be in room b, then encoding is much simpler to express.
But we still have a lot to do here, next we still need to express how we encode the actions of the planner.
SAT Encoding: Actions
For all actions that exist in the domain, we need to encode the preconditions and the add and delete effects of the action.
But, we need to do it for each time stepup to the planning horizon .
Sub formulas for encoding preconditions, add/delete effects
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So now that we know how to encode states, what about actions, lets look at the formalism for this.
Now not unlike any other action formalism, we need to encode the preconditions of the actions as well as the add and delete effects.However, this time and this is the big thing here is that were going to do it for each time step up to the planning horizon.
So as we can see here, its all of the facts that exist at a given timestep i, which is at most one less than the horizon given that the effects of the action will be on time step later.
We list all of the facts that are true at timestep i, followed by what facts are true in time step i+1 thanks to the add effects, as well as the facts that are now no longer true, as a result of the delete effects of the action.
Actions (Example)
Gripper Actions (valid in initial state)
Pickup Boxusing Gripperon Robotin Location .
Move Robotfrom Locationto Location .
SAT Encoding (at time step 0)
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Now if we return to the gripper problem, we have two actions we can execute in the initial state: pickup, where we grab a box with one of the robots grippers or the move action, that allows us to move from one location to another.
So as we can see here in this SAT encoding, we can identify the for a pickup action on time step 0, we need the box to be in the location at timestep 0 and for the the gripper to be free.But subsequently, on the next time step, we have the gripper holding the box, the box is no longer in the location and the gripper is no longer free.
Same applies for our move action.We identify what facts are true at time step 0, in this case that the robot is at the location and the post effects where we add the new fact at time step 1 that the robot is in the new location and of course that old fact is now no longer true at the next time step.
So this covers a lot of the information we require, but theres still a couple things missing, namely were not capturing information about how actions contradict one another which would help us reduce the solution space but also the actions help us encode how information transitions from one time step to another.But what about the information that doesnt change between time steps.So for example, in the case of the pickup action, the location of the robot didnt change between time steps, but I didnt encode that information in time step 1.Also, in the case of the move action, the box isnt mentioned at all, but I should be able to express that box b1 is still in location A at time step one.
This is what is commonly referred to as The Framing Problem
The Framing Problem
We need a way to capture information that does not change between time stepsand .
Two approaches to solving this
Classical Frame Axioms
State which facts are not effected for each action.
Must enumerate for all fact/action pairs where no the fact is not affected by the action.
Relies on one action being executed per time-step so axioms can be applied.
Explanatory Frame Axioms
Instead of listing what facts are not changed, explain why a fact might have changed between two time steps.
i.e. if a fact changes betweenand , then an action at stepmust have caused it.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
The Framing Problem is the situation where we also need to capture the information that doesnt change between time steps.Otherwise were barrelling towards a misunderstanding as information is not propagated forward to future timesteps because no action induced a change. This is actually an increasingly challenging issue to address as problems get bigger.
How do we model that the information is not changing between states?
There are two common approaches for how to address this: the classical frame axiom formulation and the explanatory frame axiom approach.I will be focussing more on the explanatory approach, but I wanted to briefly introduce you to both concepts.
First of all, in the classical frame axiom approach, we simply have to state which facts are not effected for each action.
But of course this means we need to then list for every fact that isnt affected by an action at a given time step, that this fact should persist into the next time step.This actually has two problems: one that we need to sit there and add that knowledge in there, but also that we need at least one action to be executed per time step.Otherwise we dont know what information holds true about that fact in the next time step.But also, if you have a planning problem where to actions could be executed on the same frame, then you will wind up running the same fact through two frame axioms.
Meanwhile, explanatory axioms go a different route.Instead of listing all the information that doesnt change, instead, we list the ways in which a piece of information could have changed between timesteps.And more precisely, identifying what action would have caused that information to change as we moved between time steps
Explanatory Frame Axioms
Enumerate the set of actions that could result in a fact changing between time steps.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So with explanatory frame axioms, were going to look at how a fact changes between two time steps.
In this case if we look at the actions from the example, we identify that in order for the fact that r1 is not a location B on time step 0, but then is in the location at time step 1, then that must mean that the robot was part of a move action at time step 0.Specifically the move action with r1 going from loc A to loc B.
Now this is still rather verbose, given wed still need to list all of the possibilities that could emerge at a given time step.Hence youre going to have a lot of facts changing between time steps, especially as the branching factor of the state space increases.
However, this approach actually enables for parallelism in the planning process, given two actions could be executed in parallel if they have the same preconditions at time step t and their effects dont conflict.You can more easily identify this by catching whether or not an effect of one action appears within one of the explanatory frame axioms for an action that isnt the action youre running against.
But, that said, parallelism is also going to be an issue, given you may result in actions that negate the effects of one another on execution, but hold valid preconditions.So how do we capture this?
Exclusion Axioms
Returning to the notion of mutual exclusion, we identify what actions cannot occur at the same time.
Complete Exclusion Axiom: only one action at a time.
Conflict Exclusion Axiom: prevents invalid actions on the same timestep.
More on this topic partial order planning in another chapter.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So we need to return to this ideal of mutual exclusion once again, a topic that we saw when we look at SAS+ planning and more recently in graph plan, where in this instance were going to identify actions that cannot be executed on the same time step.
So we have what are known as compute exclusion axioms, and this will help us identify which actions cannot be executed on the same time step. These constraints will essentially enforce a total order plan, whereby if we list every possible exclusion axiom, then only one action can be executed at the same time step.This also makes sense in the context of our gripper example, given none of the action of picking up boxes or moving the robot can be executed on the same time step.
However, there is also the possibility of using conflict exclusion axioms, whereby two actions conflict if ones either their preconditions contradict or their preconditions are not consistent with their effects.This would allow us to solve plans with in a partial order fashion, which is a
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.