CS 561a: Introduction to Artificial Intelligence
CS 561,Sessions 4-5
1
This time: informed search
Informed search:
Use heuristics to guide the search
Best first
A*
Heuristics
Hill-climbing
Simulated annealing
CS 561,Sessions 4-5
2
Best-first search
Idea:
use an evaluation function for each node; estimate of desirability
expand most desirable unexpanded node.
Implementation:
QueueingFn = insert successors in decreasing order of desirability
Special cases:
greedy search
A* search
CS 561,Sessions 4-5
3
Romania with step costs in km
374
329
253
CS 561,Sessions 4-5
4
Greedy search
Estimation function:
h(n) = estimate of cost from n to goal (heuristic)
For example:
hSLD(n) = straight-line distance from n to Bucharest
Greedy search expands first the node that appears to be closest to the goal, according to h(n).
CS 561,Sessions 4-5
5
CS 561,Sessions 4-5
6
CS 561,Sessions 4-5
7
CS 561,Sessions 4-5
8
CS 561,Sessions 4-5
9
Properties of Greedy Search
Complete?
Time?
Space?
Optimal?
CS 561,Sessions 4-5
10
Properties of Greedy Search
Complete?No can get stuck in loops
e.g., Iasi > Neamt > Iasi > Neamt >
Complete in finite space with repeated-state checking.
Time?O(b^m) but a good heuristic can give
dramatic improvement
Space?O(b^m) keeps all nodes in memory
Optimal?No.
CS 561,Sessions 4-5
11
A* search
Idea: avoid expanding paths that are already expensive
evaluation function: f(n) = g(n) + h(n)with:
g(n) cost so far to reach n
h(n) estimated cost to goal from n
f(n) estimated total cost of path through n to goal
A* search uses an admissible heuristic, that is,
h(n) h*(n) where h*(n) is the true cost from n.
For example: hSLD(n) never overestimates actual road distance.
Theorem: A* search is optimal
CS 561,Sessions 4-5
12
A* search
A* search uses an admissible heuristic, that is,
h(n) h*(n) where h*(n) is the true cost from n.
Theorem: A* search is optimal
Note: A* is also optimal if the heuristic is consistent, i.e.,
(a consistent heuristic is admissible (by induction), but the converse is not always true)
Actual cost
From N to P
CS 561,Sessions 4-5
13
CS 561,Sessions 4-5
14
CS 561,Sessions 4-5
15
CS 561,Sessions 4-5
16
CS 561,Sessions 4-5
17
CS 561,Sessions 4-5
18
CS 561,Sessions 4-5
19
1
Optimality of A* (standard proof)
Suppose some suboptimal goal G2 has been generated and is in the queue.Let n be an unexpanded node on a shortest path to an optimal goal G1.
CS 561,Sessions 4-5
20
Optimality of A* (more useful proof)
CS 561,Sessions 4-5
21
f-contours
How do the contours look like when h(n) =0?
CS 561,Sessions 4-5
22
Properties of A*
Complete?
Time?
Space?
Optimal?
CS 561,Sessions 4-5
23
Properties of A*
Complete?Yes, unless infinitely many nodes with f f(G)
Time?Exponential in [(relative error in h) x (length of solution)]
Space?Keeps all nodes in memory
Optimal?Yes cannot expand fi+1 until fi is finished
CS 561,Sessions 4-5
24
Proof of lemma: pathmax
CS 561,Sessions 4-5
25
Admissible heuristics
CS 561,Sessions 4-5
26
Admissible heuristics
CS 561,Sessions 4-5
27
Relaxed Problem
Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem.
If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1(n) gives the shortest solution.
If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the shortest solution.
CS 561,Sessions 4-5
28
This time
Iterative improvement
Hill climbing
Simulated annealing
CS 561,Sessions 4-5
29
Iterative improvement
In many optimization problems, path is irrelevant;
the goal state itself is the solution.
Then, state space = space of complete configurations.
Algorithm goal:
find optimal configuration (e.g., TSP), or,
find configuration satisfying constraints
(e.g., n-queens)
In such cases, can use iterative improvement algorithms: keep a single current state, and try to improve it.
CS 561,Sessions 4-5
30
Iterative improvement example: vacuum world
Simplified world: 2 locations, each may or not contain dirt,
each may or not contain vacuuming agent.
Goal of agent: clean up the dirt.
If path does not matter, do not need to keep track of it.
CS 561,Sessions 4-5
31
Iterative improvement example: n-queens
Goal: Put n chess-game queens on an n x n board, with no two queens on the same row, column, or diagonal.
Here, goal state is initially unknown but is specified by constraints that it must satisfy.
CS 561,Sessions 4-5
32
Hill climbing (or gradient ascent/descent)
Iteratively maximize value of current state, by replacing it by successor state that has highest value, as long as possible.
CS 561,Sessions 4-5
33
Hill climbing
Note: minimizing a value function v(n) is equivalent to maximizing v(n),
thus both notions are used interchangeably.
Notion of extremization: find extrema (minima or maxima) of a value function.
CS 561,Sessions 4-5
34
Hill climbing
Problem: depending on initial state, may get stuck in local extremum.
CS 561,Sessions 4-5
35
Minimizing energy
Lets now change the formulation of the problem a bit, so that we can employ new formalism:
lets compare our state space to that of a physical system that is subject to natural interactions,
and lets compare our value function to the overall potential energy E of the system.
On every updating,
we have DE 0
CS 561,Sessions 4-5
36
Minimizing energy
Hence the dynamics of the system tend to move E toward a minimum.
We stress that there may be different such states they are local minima.Global minimization is not guaranteed.
CS 561,Sessions 4-5
37
Local Minima Problem
Question: How do you avoid this local minimum?
starting
point
descend
direction
local minimum
global minimum
barrier to local search
CS 561,Sessions 4-5
38
Boltzmann machines
h
The Boltzmann Machine of
Hinton, Sejnowski, and Ackley (1984)
uses simulated annealing to escape local minima.
To motivate their solution, consider how one might get a ball-bearing traveling along the curve to probably end up in the deepest minimum.The idea is to shake the box about h hard then the ball is more likely to go from Dto C than fromC to D.So, on average, the ball should end up inCsvalley.
CS 561,Sessions 4-5
39
Consequences of the Occasional Ascents
Help escaping the
local optima.
desired effect
Might pass global optima
after reaching it
adverse effect
(easy to avoid by
keeping track of
best-ever state)
CS 561,Sessions 4-5
40
Simulated annealing: basic idea
From current state, pick a random successor state;
If it has better value than current state, then accept the transition, that is, use successor state as current state;
Otherwise, do not give up, but instead flip a coin and accept the transition with a given probability (that is lower as the successor is worse).
So we accept to sometimes un-optimize the value function a little with a non-zero probability.
CS 561,Sessions 4-5
41
Demo
CS 561,Sessions 4-5
42
Boltzmanns statistical theory of gases
In the statistical theory of gases, the gas is described not by a deterministic dynamics, but rather by the probability that it will be in different states.
The 19th century physicist Ludwig Boltzmann developed a theory that included a probability distribution of temperature (i.e.,every small region of the gas had the same kinetic energy).
Hinton, Sejnowski and Ackleys idea was that this distribution might also be used to describe neural interactions, where low temperatureTis replaced by a small noise termT (the neural analog of random thermal motion of molecules). While their results primarily concern optimization using neural networks, the idea is more general.
CS 561,Sessions 4-5
43
Boltzmann distribution
At thermal equilibrium at temperature T, the Boltzmann distribution gives the relative probability that the system will occupy state A vs. state B as:
where E(A) and E(B) are the energies associated with states A and B.
CS 561,Sessions 4-5
44
Simulated annealing
Kirkpatrick et al. 1983:
Simulated annealing is a general method for making likely the escape from local minima by allowing jumps to higher energy states.
The analogy here is with the process of annealing used by a craftsman in forging a sword from an alloy.
He heats the metal, then slowly cools it as he hammers the blade into shape.
If he cools the blade too quickly the metal will form patches of different composition;
If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy.
CS 561,Sessions 4-5
45
Simulated annealing in practice
set T
optimize for given T
lower T(see Geman & Geman, 1984)
repeat
Geman & Geman (1984): if T is lowered sufficiently slowly (with respect to the number of iterations used to optimize at a given T), simulated annealing is guaranteed to find the global minimum.
Caveat: this algorithm has no end (Geman & Gemans T decrease schedule is in the 1/log of the number of iterations, so, T will never reach zero), so it may take an infinite amount of time for it to find the global minimum.
CS 561,Sessions 4-5
46
Simulated annealing algorithm
Idea: Escape local extrema by allowing bad moves, but gradually decrease their size and frequency.
Note: goal here is to
maximize E.
CS 561,Sessions 4-5
47
Simulated annealing algorithm
Idea: Escape local extrema by allowing bad moves, but gradually decrease their size and frequency.
Algorithm when goal
is to minimize E.
<–CS 561,Sessions 4-548Note on simulated annealing: limit casesBoltzmann distribution: accept bad move with E<0 (goal is to maximize E) with probability P(E) = exp(E/T)If T is large:E < 0E/T < 0 and very smallexp(E/T) close to 1accept bad move with high probabilityIf T is near 0:E < 0E/T < 0 and very largeexp(E/T) close to 0accept bad move with low probabilityCS 561,Sessions 4-549Note on simulated annealing: limit casesBoltzmann distribution: accept bad move with E<0 (goal is to maximize E) with probability P(E) = exp(E/T)If T is large:E < 0E/T < 0 and very smallexp(E/T) close to 1accept bad move with high probabilityIf T is near 0:E < 0E/T < 0 and very largeexp(E/T) close to 0accept bad move with low probabilityRandom walkDeterministicup-hill50Genetic Algorithms50CS 56151How do you find a solution in a large complex space?Ask an expert?Adapt existing designs?Trial and error?51What has been traditionally done in the heat exchanger world is to take a best guess at a design with the help of an expert. Traditional heat exchanger designs are usually fully mathematically described.This initial guess can then be used as a basis, and various parameters are tweaked. The performance of the heat exchanger can be recalculated to see if the modified design is an improvement on the original one.Surely there must be a better way? There is – we dont need to look very far to find examples of optimisation in Nature.CS 56152Example: Traveling Sales Person (TSP) Classic Example: You have N cities, find the shortest route such that your salesperson will visit each city once and return.This problem is known to be NP-HardAs a new city is added to the problem, computation time in the classic solution increases exponentially O(2n) (as far as we know)DallasHoustonSan AntonioAustinMos EisleyIs this the shortest path???ATexas Sales PersonWhat ifLets create a whole bunch of random sales people and see how well they do and pick the best one(s).Salesperson AHouston -> Dallas -> Austin -> San Antonio -> Mos Eisely
Distance Traveled 780 Km
Salesperson B
Houston -> Mos Eisley -> Austin -> San Antonio -> Dallas
Distance Traveled 820 Km
Salesperson A is better (more fit) than salesperson B
Perhaps we would like sales people to be more like A and less like B
Question:
do we want to just keep picking random sales people like this and keep testing them?
CS 561
53
CS 561
54
Represent problem like a DNA sequence
San Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Houston
Mos Eisely
San Antonio
Austin
Each DNA sequence is an encoding of a
possible solution to the problem.
DNA Salesperson A
DNA Salesperson B
The order of the cities in
the genes is the order of
the cities the TSP will take.
54
One the fitness has been assigned, pairs of chromosomes representing heat exchanger designs can be chosen for mating.
The higher the fitness, the greater the probability of the design being selected. Consequently, some of the weaker population members do not mate at all, whilst superior ones are chosen many times.
It is even statistically possible for a member to be chosen to mate with itself. This has no advantage, as the offspring will be identical to the parent.
CS 561
55
Ranking by Fitness:
Here weve created three different salespeople. We then checked to see how
far each one has to travel. This gives us a measure ofFitness
Note: we need to be able to measure fitness in polynomial time, otherwise we are in trouble.
Travels Shortest Distance
55
One the population has been formed, (either randomly in the initial generation, or by mating in subsequent generations), each population member needs to be assessed against the desired properties such a rating is called a fitness.
The design parameters represented by the zeros and ones in the binary code of each chromosome are fed into the mathematical model describing the heat exchanger. The output parameters for each design are used to give the fitness rating. A good design has a high fitness value, and a poor design a lower value.
Lets breed them!
We have a population of traveling sales people. We also know their fitness based on how long their trip is. We want to create more, but we dont want to create too many.
We take the notion that the salespeople who perform better are closer to the optimal salesperson than the ones which performed more poorly. Could the optimal sales person be a combination of the better sales people?
We create a population of sales people as solutions to the problem.
How do we actually mate a population of data???
CS 561
56
CS 561
57
Crossover:
Exchanging information through some part of information (representation)
Once we have found the best sales people we will in a sense mate them. We can do this in several ways. Better sales people should mate more often and poor sales people should mate lest often.
Sales PeopleCity DNA
Parent 1F A B | E C G D
Parent 2D E A | C G B F
Child 1 F A B | C G B F
Child 2D E A | E C G D
Sales person A (parent 1)
Sales person B (parent 2)
Sales person C (child 1)
Sales person D (child 2)
57
The mating process is analogous to crossover carried out in living cells.
A pair of binary strings are used. A site along the length of the string is chosen randomly. In this example it is shown between the 6th and 7th bits, but it could be anywhere.
Both members of the pair are severed at that site, and their latter portions are exchanged. Two parents form two children, and these two daughter designs become members of the population for the next generation.
This process takes place for each pair selected, so the new population has the same number of members as the previous generation.
Crossover Bounds (Houston we have a problem)
Not all crossed pairs are viable. We can only visit a city once.
Different GA problems may have different bounds.
CS 561
58
San Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Houston
Austin
San Antonio
Mos Eisely
Dallas
Houston
Mos Eisely
Houston
Austin
San Antonio
Dallas
Austin
San Antonio
Mos Eisely
Parents
Children
Not Viable!!
TSP needs some special rules for crossover
Many GA problems also need special crossover rules.
Since each genetic sequence contains all the cities in the travel, crossover is a swapping of travel order.
Remember that crossover also needs to be efficient.
CS 561
59
San Antonio
Dallas
Mos Eisely
Houston
Austin
Dallas
Mos Eisely
Houston
San
Antonio
Austin
San Antonio
Dallas
Houston
Austin
Mos Eisely
Parents
Children
Viable
Dallas
Houston
Austin
San Antonio
Mos Eisely
What about local extrema?
With just crossover breading, we are constrained to gene sequences which are a cross product of our current population.
Introduce random effects into our population.
Mutation Randomly twiddle the genes with some probability.
Cataclysm Kill off n% of your population and create fresh new salespeople if it looks like you are reaching a local minimum.
Annealing of Mating Pairs Accept the mating of suboptimal pairs with some probability.
Etc
CS 561
60
CS 561
61
In summation: The GA Cycle
Fitness
Selection
Crossover
Mutation
New Population
(see, e.g., Wikipedia on Fitness
proportionate selection)
61
Summary of the previous steps to the model.
Populations are continuously produced, going round the outer loop of this diagram, until the desired amount of optimisation has been achieved.
CS 561
62
Demo
62
Summary of the previous steps to the model.
Populations are continuously produced, going round the outer loop of this diagram, until the desired amount of optimisation has been achieved.
GA and TSP: the claims
Can solve for over 3500 cities (still took over 1 CPU years).
Maybe holds the record.
Will get within 2% of the optimal solution.
This means that its not a solution per se, but is an approximation.
CS 561
63
GA Discussion
We can apply the GA solution to any problem where the we can represent the problems solution (even very abstractly) as a string.
We can create strings of:
Digits
Labels
Pointers
Code Blocks This creates new programs from strung together blocks of code. The key is to make sure the code can run.
Whole Programs Modules or complete programs can be strung together in a series. We can also re-arrange the linkages between programs.
The last two are examples of Genetic Programming
CS 561
64
Things to consider
How large is your population?
A large population will take more time to run (you have to test each member for fitness!).
A large population will cover more bases at once.
How do you select your initial population?
You might create a population of approximate solutions. However, some approximations might start you in the wrong position with too much bias.
How will you cross bread your population?
You want to cross bread and select for your best specimens.
Too strict: You will tend towards local minima
Too lax: Your problem will converge slower
How will you mutate your population?
Too little: your problem will tend to get stuck in local minima
Too much: your population will fill with noise and not settle.
CS 561
65
GA is a good no clue approach to problem solving
GA is superb if:
Your space is loaded with lots of weird bumps and local minima.
GA tends to spread out and test a larger subset of your space than many other types of learning/optimization algorithms.
You dont quite understand the underlying process of your problem space.
NO I DONT: What makes the stock market work??? Dont know? Me neither! Stock market prediction might thus be good for a GA.
YES I DO: Want to make a program to predict peoples height from personality factors? This might be a Gaussian process and a good candidate for statistical methods which are more efficient.
You have lots of processors
GAs parallelize very easily!
CS 561
66
Why not use GA?
Creating generations of samples and cross breading them can be resource intensive.
Some problems may be better solved by a general gradient descent method which uses less resource.
However, resource-wise, GA is still quite efficient (no computation of derivatives, etc).
In general if you know the mathematics, shape or underlying process of your problem space, there may be a better solution designed for your specific need.
Consider Kernel Based Learning and Support Vector Machines?
Consider Neural Networks?
Consider Traditional Polynomial Time Algorithms?
Etc.
CS 561
67
More demos: motorcycle design
CS 561
68
CS 561,Sessions 4-5
69
Summary
Best-first search = general search, where the minimum-cost nodes (according to some measure) are expanded first.
Greedy search = best-first with the estimated cost to reach the goal as a heuristic measure.
Generally faster than uninformed search
not optimal
not complete.
A* search = best-first with measure = path cost so far + estimated path cost to goal.
combines advantages of uniform-cost and greedy searches
complete, optimal and optimally efficient
space complexity still exponential
Hill climbing and simulated annealing: iteratively improve on current state
lowest space complexity, just O(1)
risk of getting stuck in local extrema (unless following proper simulated annealing schedule)
Genetic algorithms: parallelize the search problem
B
C
A
Basin of
Attraction for C
D
E
EMBED Word.Picture.8
_997354951.unknown
)
/
)
(
exp(
)
/
)
(
exp(
)
(
)
(
exp
)
(
)
(
T
A
E
T
B
E
T
B
E
A
E
B
P
A
P
=
=
/docProps/thumbnail.jpeg
Reviews
There are no reviews yet.