Yelp DataYou are given a le called businesses.json containing information from Yelp about businesses in the Rensselaer region, a longer version of the le you worked with in Lab #4. Each line of the le is a JSON formatted string for a single business. You can convert these strings to a Python object using the module called json. To get this started, rst try the following on WingIDE: import json for line in open(businesses.json): b = json.loads(line) print b break We are looking at the rst line of the le (hence the break). By executing this, you would get the dictionary below: {city: Cohoes, review_count: 2, name: Joes Tavern, neighborhoods: [], url: http://www.yelp.com/biz/joes-tavern-cohoes, type: business, business_id: A_Fm4v2-gQuGBBI-GBx4Uw, full_address: 16 Division St
Cohoes, NY 12047, latitude: 42.7808878, state: NY, longitude: -73.7102039, stars: 4.0, schools: [Rensselaer Polytechnic Institute], open: True, categories: [Pizza, Restaurants], photo_url: http://s3-media1.ak.yelpcdn.com/assets/2/www/img/924a6444ca6c/gfx/blank_biz_medium.gif} Let us go through this step by step. Structure of information for a business First, take a look at the dictionary for a business. The variable b is of type dictionary. It has 16 keys, of these name is a string value, full_address is a string, categories is a list of strings, review_count is an integer, and stars is a oat. You will also need the latitude and longitude of the business (both oats) for distance computation. The remaining elds for a business will not be needed for this homework. JSON We used a new module called Json, which is used to convert values between built-in Python objects and strings. Json is a frequently used data format in many Web sites, stands for JavaScript Object Notation. You can convert a Python object to string using the function dumps(), and you can convert a string containing a valid json format to a Python object using loads(). You can play with this to see the results: import json x = {a:[1,2], b:True} y = json.dumps(x) y {a: [1, 2], b: true} json.loads(y) {ua: [1, 2], ub: True} Note that json does not recognize sets, so you cannot store sets in a string using this module. The act of converting an object to a string is called serializing it. Unicode As in the above example, when you decode a line from the le we gave you, you actually see something like: { ucity: uCohoes, ureview_count: 2, } where each string has a preceding u. This means that all the strings are encoded as Unicode. For all intensive purposes, the strings in Unicode are just regular strings. You can just use all the string functions that you use as before. So, dont worry about the u and skip the rest of this part. If you want to know more about Unicode, read on. Unicode encoded strings are the same as regular (ASCII) strings if you only use characters from the English alphabet. ucat == cat True When you use special characters that are not in English, the internal code may vary depending on the encoding: x = uSchrodingers cat print x Schrodingers cat x uSchrxf6dingers cat y = Schrodingers cat print y Schrodingers cat y Schrxc3xb6dingers cat Internal encoding method is not necessarily Unicode and hence can have dierent code for the same letter. This will also dier depending on specic settings in your computer. If you use Unicode, it will always be the same. This is not really important to solving this homework, but you must understand why Unicode is uniformly used in programming text when you cannot force people to use only English spelling of words. (PS. I had to tell my editor for this homework to use Unicode to print the above example as well!) Homework Requirements You should write a program that allows the user to: Continuously ask the user rst for a location. Location is required. If the user enters an empty location or -1, your program should stop its main loop and exit. You should not ask for a category in this case. If a location is entered, you must ask for a category to lter by. Category is optional. If the user leaves the category blank, then it means there is no category criteria. If a location is provided, your program should nd all businesses at this given location according to the full_address eld, anywhere in the address string and within the input category if a category is input by the user. Report to the user: 1. the number of businesses matching the given criteria 2. the max distance between all the businesses that match the given criteria (see the explanation below for the distance computation) 3. the categories the matched businesses fall under and the number of matched businesses within each category (dictionary is very useful here). Ask the user whether she wants to see the list of businesses. If the answer enters anything but Y or y, go back to asking for location and category. If the answer is Y or y, print the name, number of reviews, the number of stars and the address for each matching business. Note 1: If 0 or 1 businesses are found, the max distance between them is clearly 0. A note of advice: you cannot use the max() function on an empty list. Note 2: You will ask whether to list the businesses only if you nd at least 1 matching business. Note 3: There is no ordering to the printed businesses. I print them in the order they appear in the le. You can choose whatever is convenient to you. The categories are ordered alphabetically which I expect you to do as well for readability. Your program should work regardless of how the information is formatted, lower case, upper case or any type of capitalization. Remember: try to get as close to this as possible, but if you do not we will give plenty of partial credit. In fact, I describe the approximate rubric so that you can see what each part is worth. Finding maximum distance between businesses To nd the distance between two points, you will use a function given to you in hw8util module which takes in two latitude and longitude values (dist(lat1,long1,lat2,long2)), and returns a distance in miles. For example, we can nd the distance between RPI Union and Dinosaur BBQ (in as the crow ies type distance): import hw8util hw8util.dist(42.730863399999997,-73.681679299999999,42.734591000000002,-73.688817) 0.4445660592440111 To nd the maximum distance between a number of businesses, you must rst nd all possible pairwise distances and then take the max. For example, if we have three businesses, a,b,c with distances: distance between a and b is 3 miles distance between b and c is 0.2 miles distance between a and c is 2.8 miles then you must return 3 miles as the maximum distance. Try to write your program so that you only compare any pair of values only once! This will become very important as you work with larger and larger set of values.
Extra Credit (20 points) Homework extra credit allows you to make up for lost homework points. If you have more than 100 in the homework average, we will truncate it at 100. But, you must still attempt this for an extra challenge. Change your program so that the user can also enter multiple categories separated by commas (there may be spaces that you must strip). Each category must be in the results. Furthermore, if you entered before a category, than that category must not be in the results returned. For example: pizza, italian returns businesses that have both pizza and italian in their categories. pizza,~italian returns all businesses that have pizza but not italian in their category. pizza, ~italian, food returns all businesses that have pizza and food but not italian in their category. As before, if no category is given, then your program should give all businesses in the given location as before. The remainder of your program works the same way. Deliverables Final Check You must use the program structure outlined in Homework # 7, i.e. import json def function1(x): return x + 1 if __name__ == __main__: z = 10 print function1(z) With no global variables above the if __name__ == __main__: line. Your program must use dictionaries and the json module to read the le. After that you can do what you want. But using a dictionary will simplify your code considerably. Submit a single le called hw8.py that assumes the existence of a text le called businesses.json. The test le will be identical to the one we give you by the way. The main loop of the program should be exit if -1 or nothing is entered for location, otherwise it should loop and ask for the next location and category. Otherwise, you will get EOF errors. Print any input that you read as we have been doing. Here are some (rough) grade guidelines: 10 points: correct program structure,20 points: correct execution of your program while loop and interaction with the user (we cannot test your code without this), 30 points: nding the businesses correctly,20 points: printing the matching categories and counts
Reviews
There are no reviews yet.