5/5 - (1 vote)

2019/5/3 CPSC 444 Project 4: Connect Four II
CPSC 444 Artificial Intelligence Spring 2019
CPSC 444 Project 4 Due: Mon 5/6 at 11:59pm
This project deals with implementing a learning gameplaying AI, specifically a computer player for Connect Four. The goal is to fill in some of the details that come up when actually trying to apply the algorithms discussed in class, and optionally think deeper about whats involved in getting learning to work successfully for a nontrivial problem.
You may work on your own or with a small group of at most two people. The project requirements are the same for individuals as for groups, and both members of a group will receive the same grade. Make sure that if you are in a group, you understand the full project dont just split things up and only know about your part.
Regardless of whether you are working individually or with another, you may share saved games with others in the class. (See Finding Supervision for Supervised Learning below.)
Specifications
Your task: create an effective Connect Four player that has learned how to play instead of being explicitly programmed with a good evaluation heuristic or specific strategies. In addition to implementing several learning strategies, you should evaluate the effectiveness of your players and discuss what went into getting them to learn effectively. Specifically:
For a C, implement two reinforcement learning algorithms one to learn state utilities (temporal difference) and one to learn (state,action) utilities (either Q learning or SARSA) along with players who can play using these utility values. This means implementing the learn method for TDLearner and either QLearner or SARSALearner and the chooseMove method for StatePlayer and StateActionPlayer.
For a B, implement a neural network architecture suitable for Connect Four and a player who can learn and play using that network. This means implementing the methods of C4Network, the learn method for C4NNLearner, and the chooseMove method for C4NNPlayer. Your writeup should address the structure of your neural network and the rationale behind your choices.
For an A, make an effort to create effective players. This includes training each of your players to the best performance you can achieve (generate a saved file of state utilities, (state,action) utilities, or neural network weights that represents each players knowledge), but it is unlikely that you will be able to just implement a learning algorithm and have a player that learns to play well right away you will need to do some tweaking. (This can be in implementation, adjustment of learning parameters, design of the neural network, training procedures, ) Full credit does not require achieving a strong player (though that is the goal), but you should have made a deliberate effort to create an effective player try some things and provide some evidence of your evaluation of the results, but also have wellfounded reasons why you expect the things youve tried to be useful things to do. (Random trialanderror will not earn much credit.) Describe what you did and why and how it worked out in your writeup.
math.hws.edu/bridgeman/courses/444/s19/projects/project4.html
1/5

2019/5/3 CPSC 444 Project 4: Connect Four II
For extra credit, go farther. This could be a more thorough exploration of different neural network configurations and training options; implementation, refinement, and evaluation of a third reinforcement learning technique (whichever of Q learning and SARSA not already implemented); or implementation, training, and evaluation of a minimax player incorporating rote learning to remember previouslycomputed minimax values. More ambitious options include training a neural net using a genetic algorithm, bringing in other training/learning strategies (such as making use of machine learning to extract a strategy from this), making efficiency improvements (such as in the representation of states in the utility databases), bounding the memory usage for the state and (state,action) databases, You might do some research to get ideas; mention your sources in your writeup. Regardless of what you do, describe what you did and why in your writeup. In all cases, crucial for full credit is evidence that what you do is done with purpose demonstrate an understanding of the material rather than random experimentation.
In addition, I hope to have a tournament involving everyones best players, with some extra credit going to the winner / top finishers.
Nuts and Bolts
Your Task
The specifications above indicate where you need to fill in code. There are two main sets of things you need to implement players who make use of learned information to play the game, and learners who incorporate new information.
As with the minimax players in project 3, implementing a player means implementing chooseMove to select the next move. chooseMove works the same way as in project 3 the result of the method should be that move_ is set to the desired (and legal) move. The move time limit should not be nearly the factor that it was with the minimax players but stop_ should still be respected if there is anything timeconsuming in chooseMove, be sure to check stop_ periodically and exit chooseMove (after setting move_) as promptly as possible if stop_ becomes true.
When selecting the next move, take into account the players explore_ instance variable which specifies the probability of the player making an exploratory move rather than the move specified by the learned policy. (explore_ has a value between 0 and 1.) An exploratory move can be a random choice (from amongst the legal moves) or you can use one of the strategies discussed in class. Since whether or not a move is exploratory is a factor in learning, make sure that the Move object constructed to represent the move chosen has the correct setting for its exploratory parameter.
For the reinforcement learners, the learn method carries out the utilityupdating algorithm given the played game provided to the method. You will need to adapt the basic algorithm for two player games (discussed more below), and dont forget to take into account whether moves are exploratory or not. The reward (Q learning and SARSA) is 0 for moves that dont result in game over, and as specified by the win, loss, and tie parameters to the learn method for moves that result in those states.
For the neural network player and learner, you will need to decide on and implement the architecture for the network and well as implementing a player and a learner. The first step is to decide on the representation what will the inputs be? What will the outputs be? How many hidden layers (0, 1, or 2) and how many neurons per layer? Perceptrons or sigmoid activation function? Keep in mind the things youve learned about effective representations. Then, implement this network in C4Network create an instance of NeuralNet to reflect your decisions about representation and implement getInputs to map between a ConnectFourBoard and the correspond network inputs, getColumn to map between the network outputs and the move (column) they indicate, and getTargetOutputs to map between a move (column) and the outputs indicating that move.
math.hws.edu/bridgeman/courses/444/s19/projects/project4.html
2/5

2019/5/3 CPSC 444 Project 4: Connect Four II
For C4NNLearner, use the train method (from NeuralNet) you do not need to implement backpropagation.
Provided Code
API
/classes/cs444/connectfour2 contains quite a bit of support code, including a complete Connect Four program with a basic GUI and several players. Much of the code is the same as what you used in project 2, though there are some additions.
First, the packages:
game contains the core Connect Four game logic and support classes
gui contains the GUI
main contains main programs for running the game in various forms
for the most part, learners are separated from players learners implement the learning algorithms to update players stored knowledge, while players use that knowledge (or some other algorithm) to play the game
heuristics contain board state evaluation heuristics for the minimax players
neuralnet contains an implementation of a neural network supporting up to two hidden layers and backpropagation for training
Some individual classes of particular interest:
StatePlayer, StateActionPlayer, and C4NNPlayer are players who play according to state utility values, (state,action) utility values, and a neural network. You will need to complete the chooseMove methods as indicated by the TODO comments.
StateDB and StateActionDB store utility values associated with states and (state,action) pairs, respectively. They can be saved to and loaded from files.
C4Network is a support class for neural network learners and players, and defines the mapping of board states and moves to/from network inputs and outputs. The sections in C4Network marked with TODO comments need to be filled in for the neural network player/learner.
NeuralNet is a neural network; SigmoidNeuron and StepNeuron implement the different kinds of neurons. NeuralNet can be saved to and loaded from a file.
TDLearner, QLearner, SARSALearner, and C4NNLearner implement the reinforcement and neural network learning algorithms. Fill in the learn methods for those learners that you want to implement as indicated by the TODO comments. The neural network used by C4NNLearner is defined in C4Network.
PlayedGame consists of the moves made during a game. It can be written to or read from a file. For supervised and reinforcement learning, learning is done after a game is played using the information from a PlayedGame.
ReinforcementLearner, StateUtilityLearner, StateActionUtilityLearner, and SupervisedLearner are interfaces implemented by the appropriate types of learners. If you implement a new learner class, implement the appropriate interfaces so your learner fits into the rest of the code.
ConnectFour runs the game with a GUI, ConnectFourPounder carries out many games between the same two players, and ConnectFourTournament carries out many games between different
math.hws.edu/bridgeman/courses/444/s19/projects/project4.html
3/5

2019/5/3 CPSC 444 Project 4: Connect Four II
pairs of players. Modify these as needed to work with different players.
MLConnectFour is set up for training and evaluating learning players. It demonstrates creating learners and players, loading a saved database, saving played games and the resulting database, and incorporating learning for reinforcement and supervised learners. You will need to modify this to train/test different players. (Read through all of main to find places you will want to update. Note: as provided, learning is only done for wins. There is no reason to limit learning to wins for reinforcement learners remove this check [near the end of main]. For supervised learners, think about what is being learned and from what, and thus whether or not it makes sense to learn from all outcomes or only wins.)
HumanPlayer allows a human to play the game (best used in conjunction with the ConnectFour main program).
RandomPlayer makes random moves.
FixedDepthMinimaxPlayer, MinimaxPlayer, and AlphaBetaMinimaxPlayer implement minimax players with fixed depth, iterative deepening, and alphabeta pruning (with iterative deepening), respectively. GroupHeuristic is a state evaluation heuristic based on the number of 1, 2, 3, and 4groups belonging to each player. VHeuristic is configured with a StateDB to use for looking up utility values for terminal nodes and a fallback evaluation heuristic to use for states not present in the database. Source code is provided only for VHeuristic; the other classes are available in c4- players.jar add it to your project as a library if you want to use any of these classes. (You can also use any of your players and heuristics from project 3 just copy the relevant classes into this project.)
Usage and Important Notes
The players chooseMove methods and the learners learn methods should not change the board if it is necessary to add and/or remove pieces as part of choosing the next move or learning from a game, either copy the board first and only modify the copy or make sure you undo any changes.
You should try to fit what you want to do into the provided framework before making changes to the framework, but you may find that you need to change something. Adding constructors and such is likely to be fine, but it is a good idea to discuss your plans with me before making changes, especially if those changes involve parameters to existing methods. In particular, you should refrain from changing the Player interface as your player will then not be able to participate in the tournament.
Reinforcement Learning for Two
Adapting temporal difference for two players was discussed in class; the idea is to treat s as the state before the players move and s as the state following the players move and the opponents subsequent move, so that the states for which utility values are being learned are always the states when it is the players turn.
You will need to think about how to adapt Q learning and SARSA.
Finding Supervision for Supervised Learning
Training a neural net via backpropagation is a supervised learning technique which means you need examples of board states with the right answer (whether it is which move to make or a state utility value). Where do those come from? A few ideas:
Humanplayed games you can use HumanPlayer and save the resulting game.
math.hws.edu/bridgeman/courses/444/s19/projects/project4.html
4/5

2019/5/3 CPSC 444 Project 4: Connect Four II
Use the minimax players as teachers. You can vary the strength of the teacher through your choice of depth (for fixed depth) or time limit (for iterative deepening and alphabeta pruning).
One important thing to keep in mind is that the two players in the game dont encounter the same board positions the player that goes first is always playing on a board with equal numbers of each color of game piece, while the player that goes second is always playing on a board with one more of the first players pieces.
To help with building up a training set, you may share saved games with others in the class if you wish.
Learning Parameters
Dont forget about the alpha, gamma (for some learning algorithms), and explore parameters consider what effect these values have on the learning process. You may want to experiment with different values and/or adjust the values as your players learn.
Handin
Hand in a written copy of your writeup.
Hand in your code by copying your Eclipse project folder to a project4 folder within your handin directory (identified by your username) in /classes/cs444/handin. Make sure your Java files are in a src subdirectory of the project4 folder.
Also hand in any player knowledge files to your handin directory they should go into the toplevel project folder (project4), not in the src subdirectory. Make sure they have recognizable names.
last updated: Tue Apr 23 22:56:02 EDT 2019 page owned by: [email protected]
math.hws.edu/bridgeman/courses/444/s19/projects/project4.html
5/5

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] algorithm game GUI html Java math database network 2019/5/3 CPSC 444 Project 4: Connect Four II

Reviews

Whatsapp Us

[SOLVED] algorithm game GUI html Java math database network 2019/5/3 CPSC 444 Project 4: Connect Four II

Reviews

Related products

[Solved] Python Assignment-Financial Products and Markets

[SOLVED] SciCalculator

[SOLVED] pakudex

[SOLVED] COP 3223 Program #2: P2 Lottery

[SOLVED] Basic Calculator Client and Server

[Solved] Indel