The problem. The slider puzzle is played on a 3-by-3 grid with 9 square blocks labeled 1 through 8 (with one
blank). Your goal is to rearrange the blocks so that they are in order. You are permitted to slide blocks
horizontally or vertically into the blank spot. The following shows a sequence of legal moves from an initial
board position (left) to the goal position (right) by first moving 5 UP into the blank spot and then moving 8
LEFT into the blank spot.
initial goal
If you want to try your skills, go to: https://mypuzzle.org/sliding.
Part 1 How to Make Moves
Use the code for the board which is provided (or code of your own). Write the “ShowMe” method which takes a board
and a sequence of moves and shows the boards that result from those moves.
Sample Output:
OriginalBoard Moves LDRUL
1 2 3
4 6 8
7 0 5
L==>
1 2 3
4 6 8
7 5 0
D==>
1 2 3
4 6 0
7 5 8
R==>
1 2 3
4 0 6
7 5 8
U==>
1 2 3
4 5 6
7 0 8
L==>
1 2 3
4 5 6
7 8 0
Part 2 Solving the Problem
For this problem, you are writing a program to SOLVE the puzzle, not to play the puzzle. From a specific
board, you are to illustrate an efficient solution. The file Prog1OUT.txt shows you an example output.
The Intelligence
One way: You could just randomly try moves, each time checking if you have found a solution. While this
would likely work eventually, it could take a while.
Our way: We are going to be more methodical. Instead of randomly picking a particular move, we say, “I
have four (or fewer) possible moves. I don’t know which one to try. I want to try ALL of them. Let me try
each of them and then (later) pick the one that was the best.” We try all choices from each board we reach. We
use a queue to remember all of the partial games we are considering.
A logical tree helps us consider the possibilities. This is a LOGICAL tree, not a coded tree. Starting at “Game
0” you can make one of four moves (UP, DOWN, LEFT, RIGHT). [We would say this is a tree with a four
way branching factor.] We can keep expanding nodes until a “leaf” is the desired (goal) game. That represents
an exhaustive search. However, we haven’t said which order we expand the nodes. Are we going to keep
going deeper in the tree [a depth first search]? This has the advantage of being easy to do (as recursion takes
care of the returning to a previous node), but has some disadvantages.
We are going to approach it slightly differently. Instead of doing one move after another – we are going to
consider all boards reachable with one move, then all boards reachable with two moves, etc. So in the picture
below, we consider Game 0, then Game 1a, Game 1b, Game 1c, Game 1d, and then all the games at the next
level. While some original games have four “next games”, other games have fewer (due to the position of the
blank and because we don’t want to undo the previous move). This is like traversing the logical game tree “by
level”. This is a called a breadth-first traversal. We examine all one move states, then all two move states, etc.
The bad news is recursion can’t help us with this kind of traversal. You will use a queue to store all the
possibilities you want to explore. In order to refresh pointer skills, you must use a linked list
representation of a queue. This needs to be code that you have written.
Below is an example of the boards I would consider. The original game is the root. Notice that I eliminate
moves which take me immediately back to the previous position. Can you explain why? Study the example
below, by marking each arrow by the move that was taken to get to the next board.
Because we try lots of possibilities, we need keep all the game we are working on. We define a state of the
game to be the board and the history of how we got to the board. First, insert the initial state into a queue. Then,
delete the state from the queue, and insert onto the queue all neighboring states (those that can be reached in one
move) of the removed state. Repeat this procedure until you reach the goal state.
Input
Allow the user to either
a. Create a random board. The user will select the number of times to jumble a board (as this controls how difficult the
board will be to solve). Make sure the game is legal.
b. Solve a specific board.
Your code should work correctly for arbitrary N-by-N boards (for any 1 ≤ N ≤ 100), even if it is too slow to solve some of
them in a reasonable amount of time.
Output
Use a file similar to SliderProject.cpp to generate test cases. The File prog1OUT.txt illustrates the format of the output
For each puzzle
1. Show the series of moves needed to solve the puzzle.
2. Show the number of boards placed on the queue and the number of boards removed from the queue.
Hints
Use of assert
One of the biggest problems we have with pointers is following NULL pointers. You will be amazed at how many
problems are solved by being cautious. Before you follow a pointer, always check to see if it is NULL. (In fact, many of
the base cases in recursion are simply checking for a NULL pointer.) If you are absolutely sure the pointer cannot be
NULL, I would still recommend checking it via an assert statement.
To use assert:
#include <assert.h>
If you think something is true, you simply state
assert(statement);
If the statement evaluates to true, nothing happens. If it doesn’t, you abort in a dramatic way (that can’t be missed).
Use of stringstream
For each data structure, I create a “toString” function which gives a printable version of the data structure. This is
better than just using cout, as I can easily write to a different location. Since the only way I print the contents of a data
structure is by calling toString, it is easy to modify the view I want to see.
A stringstream allows you to use the power of input/output operators to create strings for you to use.
The following example shows how I create a string version of the Board.
// Create a printable representation of the Board class
string Board::toString() {
stringstream ss;
for (int i=0; i < SIZE; i++)
{
for (int j = 0; j < SIZE; j++)
ss << board[i][j] << ” “;
ss << endl;
}
return ss.str();
};
What to Turn in:
Submit a zip file containing your software project and your readme.txt file.
Objective: Check the provided starter code to make sure you have the correct prototypes.
Feel free to modify prototypes if something else is more useful. You will notice in the starter
code I have public helper functions which make it possible to call a routine without knowing
the root, but have recursive “worker” routines that depend on knowing the current node.
These MUST BE your own work. Do not copy from anywhere.
In the comments to each function, provide a big-Oh expression for the complexity of the
functions you write, assuming trees are roughly balanced (depth = log(n) for n nodes).
Use recursion where appropriate, but if something isn’t logically recursive, don’t use
recursion.
Documentation of code and identifying the big-Oh is worth four points. Consult the style guidelines.
Build the trees: tree0, tree1, tree2, tree3, tree4, tree5, tree6 as shown in the starter code.
1. (1 point) Change the insert routine in the code so that each node knows how many nodes are in its
left subtree and how many nodes are in its right subtree. Create a count routine which returns
the number of nodes in the tree.
2. (1 points) Write a function, toString, that returns a string containing: a message and the keys (in
order) of a binary search tree, given the root. This should print the tree prettily. It should show
the parent pointers in parentheses and the number of nodes in the left and right subtrees in
square brackets. For example:
Demonstrate the toString function works by printing tree0.
3. (1 point) Write the function makeEmpty to remove all elements from a tree. Be sure to properly
delete the dynamically allocated nodes. Demonstrate this works by making a clone of tree0,
making it empty, and then showing tree0 hasn’t been affected.
4. (1 points) Write a function, fringe, that returns the count of the leaf nodes of a binary search
tree rooted at root. A leaf is a node with no children.
5. (2 points) Write a function to find the inorder predecessor of a node (given a node in the tree).
You should use the parent link to find the predecessor, but an inorder predecessor is not the
Program 2 Page 2
same thing as a parent. This function can be written in fewer than 10 lines of code – but that’s a
guideline, not a requirement. Note, for each test, we find the predecessor of a given node. This
needs to use a routine that can find a predecessor from a specific node (without having to
traverse the tree from the root). In the tree below, the inorder predecessor of 14 is 11. The
inorder predecessor of 15 is 14. The inorder predecessor of 7 is 6.
6. (2 points) Write the function nodesInLevel that returns the total number of nodes on the
specified level. For this problem, the root is at level zero, the root’s children are at level one,
and for any node, the node’s level is one more than its parent’s level.
7. (3 points) Write the function findKthInOrder. Given an integer k and a binary search tree
with unique values, findKthInOrder returns a pointer to the node that contains the kth
element if the elements are in sorted order – the node with the smallest value is returned
if k = 1, the node with the second smallest is returned if k = 2, and so on.
For example, in the tree shown below (t points to the root), findKthInOrder(t,4) returns a
pointer to the node with value 9, findKthInOrder(t,8) returns a pointer to the node with
value 18, and findKthInOrder(t,12) returns NULL since there are only 9 nodes in the tree.
8. (3 points) By definition, the width of a tree is the number of nodes on the longest path
between two leaves in the tree (not considering the direction of the arcs). The diagram
below shows two trees of width 9. The leaves that form the ends of a longest path are
shaded (note that there is more than one path in each tree of length nine, but no path
longer than nine nodes).
It can be shown that the width of a tree T is the largest of the following quantities:
• The width of T’s left subtree
• The width of T’s right subtree
Program 2 Page 3
• The longest path between leaves that goes through the root of T (this can be
computed from the heights of the subtrees of T)
Here’s code that’s almost a direct translation of the three properties above (assuming
the existence of a standard O(1) max function that returns the larger of two values).
// returns width of tree rooted at t
int width (TreeNode * t)
{
if (t == NULL) return 0;
int leftW = width (t->left);
int rightW = width(t->right);
int rootW = height(t->left) + height(t->right) + 1;
return max(rootW, max(leftW, rightW));
}
However, the function as written does not run in O(n) time because computing height is O(n)
work. Write a version of width that runs in O(n) time. Hint, use a function as described
below.
int widthHeight(TreeNode * t, int & height)
// pre: t is a binary tree
// post: return (via reference param) height = height of t
// return as value of function: width of t
{
}
Program 2 Page 4
9. (2 points) Two binary trees s and t are isomorphic if they have the same shape; the
values stored in the nodes do not affect whether two trees are isomorphic. In the diagram
below, tree a and tree b are isomorphic. Tree c is not isomorphic to the others.
10. (2 points) Two trees s and t are quasi-isomorphic if s can be transformed into t by
swapping left and right children of some of the nodes of s. The values in the nodes are not
important in determining quasi-isomorphism, only the shape is important. For the trees
below tree a and tree b are quasi-isomorphic because if the children of the nodes 10 in
tree a on are swapped, the tree having the same shape as tree b is obtained. All three
trees below are quasiIsomorphic.
Write a function isQuasiIsomorphic that returns true if two trees are quasi-isomorphic.
11. (2 points) Write a function which returns the least common ancestor of a node in a binary
search tree. A least common ancestor is an ancestor of both nodes and is closest to the
nodes. A node is considered to be an ancestor of itself. In the tree below, the least
common ancestor of 82 and 8 is 20. The least common ancestor of 42 and 50 is 50 (even
though 42 doesn’t exist). The least common ancestor of 57 and 40 is 50.
PART 1
You have been given the AVL tree code from your author as a starting point. Reading code is a valuable
experience. On the job, you are often asked to modify/maintain existing code. You can’t start over and
do it your way. You must incorporate your changes into the exisiting code. You are expected to
understand the code that has been given to you. Make the following changes:
1. Change all variable names to be meaningful names.
2. Make any changes needed to conform to our style guidelines.
3. Write a toString function which creates an indented version of the tree (as in program 2).
4. Implement the removeMin function. Remove the smallest node in the tree and rebalance. This
function is to be your own work, not copied from ANY other source.
The data to be stored in the AVL tree is to be generic and work for either an integer or a GameState (of
the slider puzzle variety).
Illustrate that the AVL tree is working properly by printing the tree (using toString) after each of the
following groups of operations:
• Add: 1 3 5 7 9 11 2 4 8 (now print tree)
• Remove 7 9(now print tree)
• Add 50 30 15 18 (now print tree)
• Remove Min (now print tree)
• Remove Min (now print tree)
• Remove Min (now print tree)
• Add 17(now print tree)
PART 2
Use the AVL tree as a priority queue to improve your solution to the slider puzzle problem (of program
1). This is the idea. Suppose instead of looking at solutions in the regular order (which we used in
program 1), what if you considered partial solutions which look closer? For example, which of the
following is the closest?
A priority queue is a queue such that insertions are made in any order, but when you remove a node
from the queue, it is the one with the best priority. Our priority will be an estimation of the total
amount of work needed to solve the problem – so a lower score is preferred. Thus, we will first consider
boards that “look better” to us.
In our previous solution, (program 1) , we considered all one-move solutions before considering all two
move solutions, and so on. If we are smarter in which solutions we consider, we can improve the
functioning of the solution. In this assignment, you will compare your previous solution to this new
technique.
We define a state of the game to contain at least:
• board
• cost so far: number of moves taken from initial state to reach current board
• estimated cost to reach a solution
• priority (cost so far + estimated cost to reach a solution)
• String of moves indicating how the current board was achieved.
Best-first search. We describe a solution to the problem that illustrates a general artificial
intelligence methodology known as the A* search algorithm.
First, insert the initial state into a priority queue. Then, delete from the priority queue the state
with the smallest priority, and insert onto the priority queue all neighboring states (those that can
be reached in one more move) using an appropriate estimated cost for each. Repeat this
procedure until the state removed from the priority queue is the goal state. The success of this
approach hinges on the choice of estimated cost or priority function.
You can compute your “expected work function” any way you want as long it is reasonable
and underestimates the real cost. Be creative. Here are two choices.
Manhattan distance: For each number (other than the blank), give each entry a score showing
how far it needs to be slid to get in the right location. (This is termed a Manhattan distance after
the gridlike street geography of Manhattan.)
Hamming distance: The number of tiles in the wrong position. Intuitively, a search node with a
small number of tiles in the wrong position is close to the goal.
The diagram below shows how the Manhattan distance could be used to aid the search.
We make a key observation: to solve the puzzle from a given state on the priority queue, the total
number of moves required (including those already made) is at least its priority, using our
distance function.
Consequently, as soon as we dequeue a board which has the goal state, we have not only
discovered a sequence of moves from the initial board to the board associated with the state, but
one that makes the fewest number of moves. When we dequeue, we know everything else we
will enqueue in the future will take at least as many moves to reach the goal. This is true
because our distance is an UNDER-estimate of the number of moves required.
Output:
Since we want to compare this version with our brute force solution in program 1, you will need to have
both methods working. Create a class “SliderGame” which allows you to call either “bruteForceSolve”
or “aStarSolve” on the same board. Each method should keep track of the total number of states that
were pulled off the queue.
1. Show the output for the four boards shown above under the “Part 2” heading using both
methods (brute force and A*). Your output should include
a. Which method was used
b. the initial board
c. the sequences of moves used to solve the board (i.e., URRDLL)
d. the “show me” sequence which shows what the board looks like after each move.
e. total number of boards dequeued from the queue.
2. Find an example for which your “more intelligent” method saves significant time over the brute
force method of program 1. Show the same output for this board.
Hints
During debug, when using the A* search, show each board as you pull it off the queue. Be sure to print
all the data in the board state (history, priority, expected moves remaining).
Print output both to the console and to a file. Then, you can easily remove the printing to the
console (to make it run faster and easier to read).
Getting the same code to work for ints and GameStates forces us to be more methodical in our
approach. For example:
The toString function, it looks something like:
string toString( AvlNode *t, string indent) const
{
stringstream ss;
if( t != nullptr )
{
ss << toString(t->right, indent + ” “);
ss << indent <<t->element << endl;
ss << toString(t->left, indent + ” “);
}
return ss.str();
}
This works fine if element is an int, but not so great if element is a GameState. How do you get this
to work? Ints don’t have an element component. GameStates don’t know how to print themselves.
AHHHHH. You need to overload << for GameStates.
It is done like:
ostream& operator<<(ostream& ss, const GameState& gs) {
ss << gs.toString() << endl;
return ss;
}
Note however, that this cannot be a member function of GameState because the first parameter of <<
needs to be an ss. When you make something a member function of GameState, the first (understood)
parameter is GameState. Right?
As an example: To overload == for the class MyClass:
bool MyClass::operator==(const MyClass &other) const {
… // Compare the values, and return a bool result.
}
This allows us to do a==b, where both a and b are members of MyClass. == is
a binary operator, but only the second argument is listed as a parameter.
The “understood” (first) argument is a member of the class.
Thus, if we need to overload << and use it as ss << gameState, the << would
HAVE to be a member function of ostream not of gameState. Since we have no
access to ostream, we are stuck making it a non-member function.
What to Turn in:
Submit a zip file containing your software project . The zip file should contained a readme file which
tells how to run the program (what IDE you used).
Part 1: Becoming familiar with the code
HashTable code has been given to you. No testing program has been provided. To
become familiar with how the code works, try reading in a small input file and
make sure you can create a hash table of those entries.
Make sure the following works:
a. Insert values
b. Delete values
c. Find values
d. Printing the contents of the hash table.
e. How can you control the size of the hash table?
What happens if you attempt to delete an item that isn’t there?
What if you add more things than can fit into the hash table?
Part 1 is for your benefit. Nothing needs to be turned in from this part.
Part 1 exposes a couple of “gotchas” in the code.
a. The code to print the contents relies on a toString. Obviously, this won’t
work for built-in types. If you are testing the code with strings, you’ll see
this. It would be better to overload << for any types you want to use as
HashRecords with the hash table.
b. The hash table is expecting HashRecord pointers. This has some
advantages in that you don’t have to copy whole records and can easily
return a NULL for “not found”. However, you will need to be careful.
Consider what happens when you do something like:
Pair wordFrequency;
cin >> wordFrequency.word;
wordFrequency.ct = 0;
hashTab.insert(wordFrequency.word, &wordFreqency)
Depending on where wordFreqency is defined you could have one of the following
problems:
a. Every entry in the hash table shares the same address, so if you change one
Pair, you change all Pairs.
b. You gave the hashTable the address of a local variable (which was destroyed
upon exit from the method).
Instead, you need to do something like:
cin >> word;
Pair *wordFrequency = new Pair(word,0);
hashTab.insert(word, wordFreqency)
Part 2 Using the code to solve a bigger problem
Your favorite word game is getting a little old. You decide to create “house rules” to reward creativity. This is
what you decide on. You will score each word you generate based on (1) length of word (2) value of each letter in
the word (3) bonus (associated with infrequently used word).
Length LengthValue
1-2 0
3 1
4 2
5 3
6 4
7 5
8 or more 6
Times Used
Before in game
bonus
0 5
1-5 4
6-10 3
11-15 2
more than 15 1
The formula to compute the score for a word is:
score(word) = (sum of letter values) * (lengthValue)*bonus
So suppose you use the word “ozone” for the first time. It would have the score of (1+10+1+1+1)*3*5 = 210
If you used “via” for the 12th time, it would have the score of (4+1+1)*1*2 = 12 points.
Use a hash table to record how many times each word in the input file has been seen before.
Specifics
You have been given the hash table code from your text for doing quadratic probing. Modify the
code in at least two ways:
a. Use your own hash function, patterned after code we used in class rather than the built-in
hash function that is used in the code.
b. Instead of quadratic probing, use double hashing.
Please retain the template structure of the hash table. Be careful not to add anything to the hash
table that only makes sense for this specific problem.
Input
Each input file will consist of a list of words which you are to score.
Output
For each input file, generate the score for the game. The score would typically be updated after every word is
input. For simplicity, print the score after every OutputFrequency words and at the end of the game. (Allow
the user to set OutputFrequency.)
For debugging/grading purposes, allow the user to easily request a copy of the first few (50 or so) items of the
hash table be output. No user interface is required. The user can just change the main program.
The hash table print should contain the following information:
1. The hash table should keep track of the total number finds done on the hash table AND the number of probes
required in those finds. (This will help us determine how good the hash function is.)
2. Number of items stored in hash table.
3. Size of hash table. This code changes the size of the code dynamically.
4. Contents of entries in the hash table. This includes the word and the occurrence count.
Files
game0.txt
game1.txt
game2.txt
game3.txt
game4.txt
Hints:
1. To compute the value of a character (a is 1, b is 3, c is 3…), there are a variety of ways of doing this. While
a nested if or case works, you should realize that you can easily turn the character into an int (to be used as
a subscript into a table of values):
char c;
int sub = c-‘a’;
This is classic C coding. You should know it works.
2. There is an elegant way of converting length to the length value as well.
3. Make sure you understand the difference between the hash function and the collision resolution method.
The hash function tells you how to pick your “first choice” for where an entry will be stored. The collision
resolution method tells you what to do when your first choice isn’t available.
4. When you use double hashing, you need to make sure the step for two different keys (that originally hashed
to the same location) is something different.
One way to do this is to create a different hash function for the step. For example, if your regular hash
function is
unsigned int hash(string key)
{unsigned int hashVal=0;
for (i=0; i < key.length(); i++)
hashVal = (hashVal << 5) ^ key[i] ^ hashVal;
return hashVal %TABLESIZE;
}
You could compute step as
unsigned int step(string key)
{unsigned int stepVal=0;
for (i=0; i < key.length(); i++)
stepVal = (stepVal << 7) ^ key[i] ^ stepVal;
return 1+ stepVal %(TABLESIZE-2);
}
Write a program to implement autocomplete for a given set of N strings and nonnegative weights.
That is, given a prefix, find all strings in the set that start with the prefix, in descending order of
weight.
Autocomplete is an important feature of many modern applications. As the user types, the program
predicts the complete query (typically a word or phrase) that the user intends to type. Autocomplete
is most effective when there are a limited number of likely queries. For example, the Internet
Movie Database uses it to display the names of movies as the user types; search engines use it to
display suggestions as the user enters web search queries; cell phones use it to speed up text input.
In these examples, the application predicts how likely it is that the user is typing each query and
presents to the user a list of the top-matching queries, in descending order of weight. These weights
are determined by historical data, such as box office revenue for movies, frequencies of search
queries from other Google users, or the typing history of a cell phone user. For the purposes of this
assignment, you will have access to a set of all possible queries and associated weights (and these
queries and weights will not change).
The performance of autocomplete functionality is critical in many systems.. According to one
study, the application has only about 50ms to return a list of suggestions for it to be useful to the
user. Moreover, in principle, it must perform this computation for every keystroke typed into the
search bar and for every user!
In this assignment, you will implement autocomplete by using binary search to find the set of
queries that start with a given prefix; and using a priority queue of the matching queries to produce
the top k possibilities in descending order by weight.
Part 1: Create a max priority queue as a mergeable leftist heap. A max priority queue is a data
structure that allows at least two operations: insert which does the obvious thing and deleteMax,
that finds, returns, and removes the maximum element in the priority queue. We also want the
heaps to implement a merge operation. Verify your priority queue works by creating a small test
case which shows various combinations of the following operations:
• inserts into two different queues,
• prints the queue prettily (so the tree structure can be seen),
• merges the queues,
• deletes max
Create test cases which allows you (and the grader) to verify this is working.
Part 2: Perform autocomplete.
1. Read in the file and store it in an array of Terms.
2. For each target word (of length r), provide the top “count” matches using the following
procedure
(a) Using a binary search on the array, find all words whose first r letters match that of the target
word. You will need to implement two binary searches,
• //find the first location of key in a[] which matches r locations of key.
int firstIndexOf(Term a[], string target, int r)
• //find the last location of key in a[] which matches r locations of key.
int lastIndexOf(Term a[], string target, int r)
(b) Put all terms found in part (a) into a priority queue. Using weight as a priority, output the
top “count” terms.
(c) Allow merging of queues
Input format
The first line of the SortedWords.txt file is the number of terms in the file. The remaining items are
the word followed by its frequency.
User Interface
Allow the user to input (a) the prefix of a word (b) the number of outputs s/he wants to see (count)
Allow the user to say s/he wants the “and” of two words. This will be accomplished by creating
two priority queues and merging them.
Feel free to add other user options if you want.
Output format The output will look something like:
Substring: acc count=9
according
accept
account
access
accident
accuse
accomplish
accompany
accurate
Substring: acc &all count=11
all
allow
according
accept
account
access
accident
accuse
accomplish
ally
accompany
Part 1: Union Find
Write the code to perform union find (using path compression) on a set of 30
elements. Use a smart union (either by size or by height, your choice.) Create a
series of carefully constructed tests so that you can verify union/find is working
properly and that path compression works. Print out the contents of your array.
Everyone’s tests will be different. You will be graded on how well your tests
demonstrate properly functioning. This ability to test your code (and dig into the
details) is exactly the skill referred to by “Don’t debug in the dark”.
For ease in programming (and for an easy transition to part 2), instead of using the
same variable to be EITHER parent or negative size, consider using an array of
pairs as shown below.
ParentID Size
0 0 1
1 1 1
2 4
3 4
4 4 7
…
19 19 6
Part 2: Percolation
Percolation. Given a porous landscape with water on the surface under what
conditions will the water be able to drain through to the bottom? Scientists have
defined an abstract process known as percolation to model such situations.
The model. We model a percolation system using an N-by-N grid of sites. Each
site is either open or blocked. A full site is an open site that would fill with water if
water were applied to the top of a grid. In other words, a full site can be connected
to an open site in the top row via a chain of neighboring (left, right, up, down)
open sites. We say the system percolates if there is a full site in the bottom row. In
other words, a system percolates if we fill all open sites connected to the top row
and that process fills some open site on the bottom row
The problem. In a famous scientific problem, researchers are interested in the
following question: if sites are independently set to be open with probability p (and
therefore blocked with probability 1 − p), what is the probability that the system
percolates?
Monte Carlo simulation. To estimate the percolation threshold, consider the
following computational experiment:
• Initialize all sites to be blocked.
• Repeat the following until the system percolates:
o Choose a site (row i, column j) uniformly at random among all
blocked sites. [If you select an already open site, try again.]
o Open the site (row i, column j).
• The fraction of sites that are opened when the system percolates provides an
estimate of the percolation threshold.
By repeating this computation experiment T times and averaging the results, we
obtain a more accurate estimate of the percolation threshold.
For example, if sites are opened in a 20-by-20 grid according to the snapshots
below, then our estimate of the percolation threshold is 204/400 = 0.51 because the
system percolates when the 204th site is opened.
In a readme.txt file, answer the following questions regarding the analysis of
running time and memory usage.
Output:
Using a 20-by-20 grid, (1) show the grid after every 50 sites have been opened and
(2) show the grid when it finally percolates. Each site should be printed using the
following notation:
a. a blank space for open
b. X for blocked
c. The edges of the grid are marked with ‘_‘ or ‘|’
Hints:
1. You won’t actually denote sites which are connected to the top (or bottom)
differently – as union/find can’t easily tell you which elements are members
of the same group.
2. Use the smart union find (with path compression) to keep track of the set of
sites that are connected.
Given the groups, you will need to plan to decide how you can tell if a group
percolates. You can do this any way you like, as long as it is efficient. Our
goal with union/find is to have an almost constant time operation. Thus,
anything you add has to maintain this goal.
One idea is to have each root keep needed information. If the root of a
group always stores the lowest and highest rows in the collection, it will be
easy to tell when it percolates.
So, for the case above, the array might contain more information. 57 is the number
of the site pointed to by the red arrow. It is the root of a group of 6 elements which
are between row 6 and 7. The green arrow points to site 4.
ParentID Size lowestRow highestRow
0 0 1 0 0
1 1 1 0 0
2 4
3 4
4 4 22 0 7
…
57 57 6 6 7
Another way of accomplishing the goal is to have a dummy row at the top and a
dummy row at the bottom of the grid. The top dummy row will be unioned into
the same group. The dummy bottom row will be unioned into a different group.
After opening up nodes in the grid, the whole grid percolates if anything in the top
dummy row is in the same group as anything in the bottom dummy row.
3. Since union find works on integers (instead of pairs of integers), you will
want a way of converting between (x,y) and an integer, i. One way to do
this is as shown below:
a. Given x and y i = N*x+y
b. Given i, x = i/N and y = i%N
History: WordNet
Suppose you had a graph between sets of nouns in which an arc A -> B indicates that the set of nouns A is a more specific
case of the set of nouns B. Such a graph would allow us to reason about similarities and differences. The graph below is
an example of what this might look like.
Measuring the semantic relatedness of two nouns. Semantic relatedness refers to the degree to which two concepts are
related. Measuring semantic relatedness is a challenging problem. For example, you consider George W. Bush and John
F. Kennedy (two U.S. presidents) to be more closely related than George W. Bush and chimpanzee (two primates). It
might not be clear whether George W. Bush and Eric Arthur Blair are more related than two arbitrary people. However,
both George W. Bush and Eric Arthur Blair (aka George Orwell) are famous communicators and, therefore, closely
related.
WordNet is a semantic lexicon for the English language that computational linguists and cognitive scientists use
extensively. For example, WordNet was a key component in IBM’s Jeopardy-playing Watson computer system.
Your Assignment:
Given a directed acyclic graph, find the shortest path between any two nodes, the length of the path, and the shortest
common ancestor.
Shortest common ancestor. An ancestral path between two vertices (v and w) in a rooted DAG is a directed path
from v to a common ancestor x, together with a directed path from w to the same ancestor x. A shortest ancestral path is
an ancestral path of minimum total length. We refer to the common ancestor in a shortest ancestral path as a shortest
common ancestor. Note that a shortest common ancestor always exists because the root is an ancestor of every vertex.
Note also that an ancestral path is a path, but not a directed path.
We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of
vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B.
Additional performance requirements. For full credit, the methods length() and ancestor() should take time
proportional to the number of vertices and edges reachable from the argument vertices (or better). For example, to
compute the shortest common ancestor of v and w in the digraph below, your algorithm should examine only the
highlighted vertices and edges.
Hint:
Every ancestor of a node v (in the above example) needs to know (1) the length of the shortest path to it from v (2) the
predecessor responsible for that shortest path. If you do a breadth first traversal from v and recorded the best length, you
wouldn’t have to keep changing the length of the best path. Storing the predecessor help you it recreating the best path.
BONUS feature: Outcast detection. Given a list of nouns x1, x2, …, xn, which noun is the least related to the others? To
identify an outcast, compute the sum of the lengths between each noun and every other one:
di = length(xi, x1) + length (xi, x2) + … + length xi, xn)
and return a noun xt for which dt is maximum. Note that because length (xi, xi) = 0, it will not contribute to the sum.
Input:
The input will be the number of nodes, number of edges, and each edge (A-> B is represented by A B)
Testing:
a. Allow the user to select which file he/she will use for input:digraph1.txt, digraph2,txt, or digraph3.txt. The first two
files are provided. digraph3.txt is one you make up.
b. All sets of nodes will be entered by a list of node numbers followed by a negative number.
c. Allow the user to select from the following commands:
i. (10 points) Given two nodes in the graph, find the shortest common ancestor, shortest ancestral path, and
associated length
ii. (10 points) Given two subsets of nodes, find the shortest common ancestor, shortest ancestral path, and
associated length
iii. (5 points) BONUS: Given a set of nodes find the outcast
Here are the two provided input files and their associated graphs.
25
28
1 0
2 0
3 1
4 1
5 2
6 2
7 3
8 3
9 3
10 6
11 5
12 6
13 7
14 7
15 9
16 9
17 10
18 10
19 12
20 12
21 16
22 16
23 20
24 20
13 3
9 4
10 1
23 4
ADDITIONAL BONUS (10 points) If you have time, try doing this for actual noun sets. See the instructions on
the assignment page.
Reviews
There are no reviews yet.