Independent Project
Instructions
You will have the opportunity to test your Python skills by developing a ”Crime Analysis Interface”. The purpose of this interface is to retrieve a information on crimes by neighbourhood during a specified year and month. The interface will also produce statistical summaries, data visualisation, and timeseries breakpoints on crime locations by neighbourhood searched.
Answer all questions, paying attention to the instructions on file and function names. In addition to the requested code files, some questions ask for text answers such as short explanations. Include these, indicating clearly which answers are relevant to which questions, in a single file answers.txt or answers.docx. If you would like to create a more professional report you are welcome to include your graphs in this file as well. When you are done, zip all the requested files into an archive and hand in that zip file, using your candidate number and the module code as the name (e.g., ABCD1_SECU0012_Project.zip).
You are allowed to work in pairs or alone. If you work in pairs, you may hand in the same code but your written answers must be done independently. You must indicate to us at beginning of the project who you are working with (using their name and not the candidate number), and you must indicate in your report who you worked with (using their candidate code and not their name) to ensure anonymity.
Although in pairs it may happen that some programs may be written mainly by one person, it is important that both the members will be able to explain what the code does on their own.
-
Downloading data using the API [25 marks]
-
Downloading the dataset [10 marks]
We will analyse the crimes recorded over two years in the boroughs of London. The website implements APIs to download the different datasets automatically. The URL for this dataset is https://data. police.uk/api/outcomes-at-location?date=2022-01&lat=51.5098656&lng=-0.118092.
Once you have saved all the data downloaded in a variable, print the number of crimes in the dataset and add as a comment below a description of the structure of the data. If you need further print (or pprint) statements to understand the structure of the dataset, use them and comment the code in an appropriate manner, but the print statement containing the number of crimes in the dataset needs to be the last one.
Create a script named downdata.py. Your first task is to download the data at the link above and print the keys of the json file including the type of the paired value. You will then save the data in a pickle file called crimedata.p.
-
API requests based on location [15 marks]
The records saved in the previous exercise use the latitude and the longitude identifying London. Use the locations.csv file from Moodle to import the data of the different locations around the UK. Create a function in the script userlocation.py that asks a user to input up to 5 locations and create a nested
dictionary containing the location as a key and the latitude and longitude as items of the nested dictionary related to that location. The nested dictionary will be the output of your function and you will use it to submit the requests using the API, one request per each location by inserting the correct values of latitude and longitude. Print the nested dictionary and the number of crimes per each location at the end of the script.
-
-
Data Parsing [10 marks]
This section of the project consists of exploring the data and retrieving interesting information. You will use the pickle file you saved as crimedata.p
-
Import data [3 marks]
Create a script called dataparser.py and start by reading your crimedata.p pickle file.
-
Select the type of crime [7 marks]
The data contains different types of crime. Create a list with all the different type of crimes (e.g. violent- crime, theft-from-the-person) that have been committed and another list with the location subtypes (e.g. road, nightclubs, etc) according to the dataset.
Print the two lists.
Make sure to take the necessary precautions so the input is valid (i.e. setting input to lowercase, string, re-asking user input). You should save the user input into variables that will serve as search parameters in the following tasks.
-
-
Graphical representations [30 marks]
-
Bar charts [15 marks]
Use the lines of code from the previous exercise to import the data from the pickle file and create two bar graphs.
The first one will have the types of crimes on the x axis and the occurrences of the crimes as value for each bar.
The second bar plot will focus on the location of the crimes. Plot the location subtypes occurrences for the violent-crime datapoints in our dataset.
Remember that the plots readability is important when evaluating the results in a research work, therefore it is important to take care about how the graphs look like.
First questions to be answered in the answers file:
-
Which are the most relevant crimes in London according to our dataset? Is this result something you have found in other reports or does it differ?
-
Which are the locations most affected by violent crimes? Is there any interpretation you can give to justify this result?
-
-
Line graph [15 marks]
In the same script (use comments to separate the tasks), create the following line graph using the months and years as ticks on the X axis. While the API string specifies the date of the last case update, the data contains the crime date as well. Plot four lines:
-
One line for all the crimes in the dataset
-
One line for the theft from person type of crime
-
One line for the other theft
-
One line for the violent crimes
Third question to be answered in the answers file:
3. Which differences do you notice among the lines in the plot?
-
-
-
Folium: plotting on a map [20 marks]
The data we are analysing has precise locations around the city that can be visualised on a map. To do this, we will use the library called folium. The instructions on how to install the library are at the following link: https://pypi.org/project/folium/.
The documentation explaining the library is available at https://python-visualization.github. io/folium/. There are several commands in the library, we will indicate which ones we suggest you to use in the two subsections of this part. If you find another command more useful for how you envisage the graph, there is no problem at all in using it.
-
A point for each crime [10 marks]
Write a mappoint.py script for this task. Import the data saved in crimedata.p and use the code written previously to load and parse the dataset in the way you prefer.
Use folium.Map to create the map of London and folium.CircleMarker to indicate the point where each crime has been committed according to the latitude and longitude values in the dataset.
Graph and save the resulting map.
-
Cluster crimes on the map [10 marks]
In the same script, create another graph where the crimes are clustered in groups. Use folium.Map to load the map of London and folium.plugins.FastMarkerCluster to instruct folium to create the clusters according to the data you will provide. Use folium.Marker for each crime in the dataset using the accurate values of latitude and longitude to place the marker on the map.
Graph and save the resulting map.
Fourth question to be answered in the answers file:
4. Are there hotspots in our dataset? Are you able to explain why according to the graphs and which graph is more effective in showing the hotspots?
-
-
Handing in
Remember to submit all the necessary files in a compressed folder named using your Candidate Number (ABCD1_SECU0012_Project.zip). The files are:
-
all the .py scripts
-
the pickle and csv files storing information according to the instructions
-
the answers.docx (or whichever other text format) file containing your answers to the questions in the last sections
-
if working in a pair, the partner.txt file containing the candidate number of your partner
You may have noticed that the sections only arrive to 85 points in total. The remaining 15 points are assigned based on the answers that are in the answers.docx file.
Remember to comment your files and use functions where you feel it is appropriate.
Reviews
There are no reviews yet.