Project 2 The Spacetime Crawler
This assignment is to be done in groups of up to 3. You can use text-processing code that you or any classmate in your team wrote for the previous assignment. You cannot use crawler code written by nongroup-member classmates. Use code found over the Internet at your own peril it may not do exactly what the assignment requests. If you do end up using code you find on the Internet, you must disclose the origin of the code. As stated in the course policy document, concealing the origin of a piece of code is plagiarism. Use the Discussion Board on Piazza for general questions whose answers can benefit you and everyone.
Your crawler is standalone but shares data with the rest of the crawlers in the course. Each crawler has their own frontier in a server in ICS, and can manage this frontier. The frontier does a lot of the heavy work. Your crawler will be given 1 URL at a time and should proceed to download and process it .
Implementing your Project
Step 1 Getting the project
git clone https://github.com/Mondego/spacetime-crawler
Step 2 Installing the dependencies
Make sure you do not have conflicting libraries by issuing the command. python -m spacetime version
You should see the following output Sp acetim e Version is 2. 0
Rtyp es Version is 2.0
If the outputs do not match, or if it returns an error unrecognized argument: version, please uninstall the old spacetime, and rtypes by issuing the commands.
python -m pip uninstall spacetime
python -m pip uninstall rtypes
Get the latest repository of spacetime-crawler to get the latest version of spacetime and rtypes. Both packages are included with the assignment.
Step 3 Writing the required classes, functions, and parameters
You must set this correctly to get credit for the project. If we cant trace your crawler in our logs, its equivalent to you not doing the project.
- Write out the details of your teammates and you in the team.txt file: The details are a comma
separated list of your UCInetID, and student number. Each team member is written in a new line. E G:
panteater,12345678
pe teran t,87654321
2 . Run generate_crawler_application.py: It will generate files in two folders: applications/search and
d a ta m o d e l/search with the crawler code in them customized by the details in team.txt. If the details
Reviews
There are no reviews yet.