[SOLVED] lisp python CSE 231

$25

File Name: lisp_python_CSE_231.zip
File Size: 178.98 KB

5/5 - (1 vote)

CSE 231
Fall 2019
Project #6
This assignment focuses on the design, implementation and testing of a Python program which uses lists to solve the problem described below.
It is worth 45 points (4.5% of course grade) and must be completed no later than 11:59 PM on Monday, October 21.
Assignment Overview Lists
Assignment Purpose
Countries all over the world use the internet and while the major websites are used by most
countries, more specific websites may show what a countrys culture, interests and people are like. By taking a closer look at the top ranked websites for each country and top websites in general, trends in internet consumption can be identified. A closer look can also be taken at how a search engine might locate closely fitting results when looking through websites. Our source for data is Kaggle (https://www.kaggle.com/bpali26/popular-websites-across-the-globe).
Project Specifications
Data: We provide .csv files (comma-separated value) files. They need to be opened with the specified encoding because there are some non UTF-8 characters in the file and read using csv.reader because there are commas within some fields. We are interested in five items in each row of data. If any fields are empty or any numeric fields cannot be converted to int, ignore the entire row of data:
Traffic_rank is the overall (global) rank of a website in the world, whereas country_Rank is the rank of a website within the country (specified at index 30). So, country_rank is the rank of the website within that country in contrast to traffic_rank which is the rank of the website across world.
Name
Index
Convert to this type
Note
Country_Rank
0
int
Website
1
Traffic_Rank
14
int
Remove internal spaces
Avg_Daily_Pageviews
5
int
Remove internal spaces
Country
30
a) open_file() : This function prompts the user for the file name as input. It will try to open the file and return the file pointer. If the file is not found, the function must print out an error message and continue asking the user for a file name until the file can be opened and returned. When using the open command, include encoding=ISO-8859-1 so the file can be opened properly.
You must implement the following functions:

b) read_file( ) : This function is meant to convert the file into a list of tuples so we can view and manipulate the data. To read the opened file use reader = csv.reader( )
to skip over the header line use
next(reader, None).
Extract the following data into a tuple from each row and add each tuple into a list that you will return. When going through each row, convert country rank, the traffic rank and average daily page views into integersremove the spaces from them before converting to integers. If any cannot be converted to integers (e.g. in the case of N/A), skip the row. Create a tuple:
(rank, website, traffic, page_views, country)
Return the list of tuples sorted in ascending order (in increasing order) by country rank and alphabetically by country.
Hints: 1) try-except is very useful when you try to convert those values to int.
2) Both list.sort() and sorted() have a key parameter that takes a single argument and returns a key to use for sorting purposes. The operator module function allow multiple levels of sorting. For example, if key=itemgetter(1,2), it means the sorting will be done in ascending order based on item 1 then item 2
c) remove_duplicate_sites( ) : This function will go through our main list of tuples and remove duplicate sites from the list (since most countries have google.com listed as their most used web site). While going through each item in our main list, split the website URL string by the dot and keep track of the domain name of the URL (google.com should be considered to be the same as google.ru since were ignoring the extensions). Youll want to return a list of tuples with no duplicates sites (For example, google.com should only appear once even though other countries may have google.fr or google.ru listed). The tuples in the list are the same as specified in the read_file function: (rank, website, traffic, page_views, country). Sort the list by ascending order of country rank and then by website, that is, key=itemgetter(0, 1). Hint: keep a list of sites you have seen and use that list to decide if a site is a duplicate or not.
d) top_sites_per_country( , ) : This function will return the top 20 ranked sites for a specified country (the string parameter) sorted by the country rank in increasing order. This list parameter is the list returned by the read_file() function. The tuples in the list are the same as specified in the read_file function: (rank, website, traffic, page_views, country). Return the sorted list.
e) top_sites_per_views( ) : This function will return the top 20 sites ranked by page views in descending order. The tuples in the list are the same as specified in the read_file function: (rank, website, traffic, page_views, country). (Optional challenge: write this as a two-line function.). Note that there are duplicate entries for some common websites (For example, www.google.com and www.google.fr). You should

keep the website entry with the highest page views (For example, www.google.com will have the highest of all google sites). You should first sort your data by page_views in descending order and remove the duplicates using remove_duplicate_sites()function. You should then return the top 20 sites ranked by page views in descending order
f) main(): In this function will select from 3 options. Begin by prompting for a file, opening the file, and reading the file. Then, it will loop asking the user for which option they would like to pursue first. Valid choices include 1,2,3 or q (case-insensitive). Pressing q or Q would allow the user to exit the program. If none of those choices is selected, an error message will be printed, and theyll be prompted for an option again.
a. If option 1 is selected, this option will call the top_sites_per_country function to return the top 20 sites for the country of choice. The user should be prompted for a country. Then the website, traffic rank and average daily page views for the top 20 results should be displayed using this formatted string:
{:30s} {:>15d}{:>30,d}
b. If option 2 is selected, this option will prompt for a string and return sites that match. The
user should be prompted for a keyword and then if any of the domain names in our list (that is our list of websites) contain the keyword, they will be displayed. Make the search keyword lower case before searching. Otherwise, a message should be printed suggesting that no websites matched the keyword. Use the following formatted string for the title: print({:^50s}.format(Websites Matching Query)) and {:<10s}” for each website that will be displayed.c. Option 3 will return the top 20 visited websites based on the average daily page views. Call the top_sites_per_views function. Display the website and average daily pageviews using this formatted string: “{:30s} (use “{:30s} {:>15s}{:>25s} for the header)
Deliverables
The deliverable for this assignment is the following file: proj06.py the source code for your Python program.
{:>20,d}{:>25s}
Be sure to use the specified file name and to submit it for grading via the Mimir system before the project deadline.
Sample Output
Function Test read_file:
Input: website_tiny.csv
Output:
[(1, www.gazetaexpress.com, 695, 262029, Albania), (2, www.google.com, 1,
4192159833, Afghanistan), (2, www.joq.al, 1375, 152807, Albania), (3,
www.youtube.com, 2, 2679159025, Afghanistan), (3, www.google.com, 1, 4192159833,
Albania), (4, www.facebook.com, 3, 1082985733, Afghanistan), (4,
www.youtube.com, 2, 2679159025, Albania), (5, www.yahoo.com, 6, 383352336,
Afghanistan), (5, www.balkanweb.com, 3410, 128194, Albania), (6,
www.acbar.org, 8011, 712760, Afghanistan), (6, www.google.al, 4300, 1615248,
Albania), (7, www.bbc.com, 104, 24690228, Afghanistan), (7, www.facebook.com,

3, 1082985733, Albania), (8, www.wikipedia.org, 5, 397197324, Afghanistan), (8,
www.gazetablic.com, 5364, 49739, Albania), (9, www.jobs.af, 17003, 169355,
Afghanistan), (9, www.syri.net, 4347, 83070, Albania), (10, www.live.com, 15,
186702228, Afghanistan), (10, www.merrjep.com, 5079, 134347, Albania), (11,
www.bbc.com, 104, 690228, Afghanistan), (11, www.koha.net, 4470, 73839,
Albania)]
Function Test remove_duplicate_sites:
Input Data: [(2, www.google.com, 1, 4192159833, Afghanistan), (4, www.yahoo.com,
7, 1234567, France), (3, www.youtube.com, 2, 2679159025, Afghanistan), (4,
www.facebook.com, 3, 1082985733, Afghanistan), (5, www.yahoo.com, 6, 383352336,
Afghanistan), (6, www.acbar.org, 8011, 712760, Afghanistan), (7, www.bbc.com,
104, 24690228, Afghanistan), (8, www.wikipedia.org, 5, 397197324, Afghanistan),
(9, www.jobs.af, 17003, 169355, Afghanistan), (10, www.live.com, 15, 186702228,
Afghanistan), (11, www.bbc.com, 104, 690228, Afghanistan)]
Output:
[(2, www.google.com, 1, 4192159833, Afghanistan), (3, www.youtube.com, 2,
2679159025, Afghanistan), (4, www.facebook.com, 3, 1082985733, Afghanistan), (4,
www.yahoo.com, 7, 1234567, France), (6, www.acbar.org, 8011, 712760,
Afghanistan), (7, www.bbc.com, 104, 24690228, Afghanistan), (8,
www.wikipedia.org, 5, 397197324, Afghanistan), (9, www.jobs.af, 17003, 169355,
Afghanistan), (10, www.live.com, 15, 186702228, Afghanistan)]
Function Test top_sites_per_country:
Input
[(2, www.youtube.com, 2, 2679159025, Mexico), (3, www.google.com, 1,
4186314171, Mexico), (4, www.facebook.com, 3, 1082985733, Mexico), (5,
www.live.com, 15, 186702228, Mexico), (6, www.yahoo.com, 6, 383352336,
Mexico), (7, www.wikipedia.org, 5, 397197324, Mexico), (8, www.blogspot.mx,
468, 3999662, Mexico), (9, www.mercadolibre.com.mx, 540, 12445106, Mexico), (10,
www.perfecttoolmedia.com, 440, 7230160, Mexico), (11, www.caliente.mx, 848,
2440819, Mexico), (12, www.msn.com, 39, 71019660, Mexico), (13,
www.twitter.com, 12, 206597988, Mexico), (14, www.netflix.com, 31, 58866838,
Mexico), (15, www.ntd.tv, 37, 216392, Mexico), (16, www.instagram.com, 17,
156961142, Mexico), (17, www.adf.ly, 106, 22859613, Mexico), (18,
www.whatsapp.com, 72, 25618356, Mexico), (19, www.xvideos.com, 62, 146500484,
Mexico), (20, www.debate.com.mx, 1108, 272284, Mexico), (21, www.uptodown.com,
104, 4984195, Mexico), (22, www.wordpress.com, 41, 48483097, Mexico), (23,
www.popads.net, 63, 36443085, Mexico), (24, www.sat.gob.mx, 1629, 2984364,
Mexico), (25, www.rolloid.net, 252, 41534, Mexico), (26, www.mileroticos.com,
1012, 958893, Mexico), (27, www.onclkds.com, 44, 49754784, Mexico), (28,
www.unam.mx, 1673, 2004959, Mexico), (29, www.ouo.io, 313, 10973435, Mexico),
(30, www.adexchangeprediction.com, 136, 9024880, Mexico), (31,
www.amazon.com.mx, 2409, 3107430, Mexico), (32, www.animeflv.net, 699, 7742938,
Mexico), (33, www.taringa.net, 330, 5896939, Mexico), (34, www.slideshare.net,
134, 15075653, Mexico), (35, www.shink.in, 451, 8768492, Mexico), (36,
www.tumblr.com, 48, 108349851, Mexico), (37, www.mediafire.com, 156, 13486044,
Mexico), (38, www.sopitas.com, 2029, 181010, Mexico), (39, www.reddit.com, 9,
649176107, Mexico), (41, www.mega.nz, 291, 8871047, Mexico), (42,
www.microsoft.com, 47, 73327159, Mexico), (43, www.eluniversal.com.mx, 2734,
1046065, Mexico), (44, www.pornhub.com, 38, 68917274, Mexico), (45,
www.wikia.com, 69, 66045720, Mexico), (46, www.t.co, 32, 71840104, Mexico),
(47, www.ebay.com, 36, 182446176, Mexico), (48, www.pinterest.com, 60, 50641889,
Mexico), (49, www.youtube-mp3.org, 293, 4261179, Mexico), (50, www.mundo.com,
1733, 1153748, Mexico),(2, www.google.com, 1, 4192159833, Afghanistan),
(4,www.yahoo.com,7, 1234567,France),(3, www.youtube.com, 2, 2679159025,
Afghanistan), (4, www.facebook.com, 3, 1082985733, Afghanistan), (5,
www.yahoo.com, 6, 383352336, Afghanistan), (6, www.acbar.org, 8011, 712760,
Afghanistan), (7, www.bbc.com, 104, 24690228, Afghanistan), (8,
www.wikipedia.org, 5, 397197324, Afghanistan), (9, www.jobs.af, 17003, 169355,

Afghanistan), (10, www.live.com, 15, 186702228, Afghanistan), (11,
www.bbc.com, 104, 690228, Afghanistan)]
Output:
[(2, www.youtube.com, 2, 2679159025, Mexico), (3, www.google.com, 1,
4186314171, Mexico), (4, www.facebook.com, 3, 1082985733, Mexico), (5,
www.live.com, 15, 186702228, Mexico), (6, www.yahoo.com, 6, 383352336,
Mexico), (7, www.wikipedia.org, 5, 397197324, Mexico), (8, www.blogspot.mx,
468, 3999662, Mexico), (9, www.mercadolibre.com.mx, 540, 12445106, Mexico), (10,
www.perfecttoolmedia.com, 440, 7230160, Mexico), (11, www.caliente.mx, 848,
2440819, Mexico), (12, www.msn.com, 39, 71019660, Mexico), (13,
www.twitter.com, 12, 206597988, Mexico), (14, www.netflix.com, 31, 58866838,
Mexico), (15, www.ntd.tv, 37, 216392, Mexico), (16, www.instagram.com, 17,
156961142, Mexico), (17, www.adf.ly, 106, 22859613, Mexico), (18,
www.whatsapp.com, 72, 25618356, Mexico), (19, www.xvideos.com, 62, 146500484,
Mexico), (20, www.debate.com.mx, 1108, 272284, Mexico), (21, www.uptodown.com,
104, 4984195, Mexico)]
Function Test top_sites_per_view:
Input
[(1, www.gazetaexpress.com, 695, 262029, Albania), (1, www.youtube.com, 2,
2679159025, Algeria), (1, www.google.com, 1, 4192159833, Andorra), (1,
www.google.com.ar, 97, 55533785, Argentina), (1, www.youtube.com, 2, 2679159025,
Armenia), (1, www.google.com, 1, 4192159833, Aruba), (1, www.google.com.au,
70, 109477960, Australia), (1, www.google.at, 192, 30551273, Austria), (2,
www.google.com, 1, 4192159833, Afghanistan), (2, www.joq.al, 1375, 152807,
Albania), (2, www.google.dz, 243, 18372811, Algeria), (2, www.google.ad,
29284, 266275, Andorra), (2, www.youtube.com, 2, 2679159025, Angola), (2,
www.youtube.com, 2, 2679159025, Argentina), (2, www.google.am, 3108, 1519872,
Armenia), (2, www.youtube.com, 2, 2679159025, Aruba), (2, www.youtube.com, 2,
2679159025, Australia), (2, www.youtube.com, 2, 2679159025, Austria), (3,
www.youtube.com, 2, 2679159025, Afghanistan), (3, www.google.com, 1, 4192159833,
Albania), (3, www.facebook.com, 3, 1082985733, Algeria), (3, www.youtube.com,
2, 2679159025, Andorra), (3, www.google.com, 1, 4192159833, Angola), (3,
www.facebook.com, 3, 1082985733, Argentina), (3, www.ok.ru, 52, 46811442,
Armenia), (3, www.facebook.com, 3, 1082985733, Aruba), (3, www.google.com, 1,
4192159833, Australia), (3, www.google.com, 1, 4192159833, Austria),
Many lines removed for brevity. The complete input is available on the project page.
(48, www.thewhizproducts.com, 345, 5384162, Angola), (48, www.pornhub.com, 38,
68917274, Argentina), (48, www.1plus1tv.ru, 3957, 620460, Armenia), (48,
www.imgur.com, 36, 82710984, Aruba), (48, www.theguardian.com, 138, 16460152,
Australia), (48, www.onclkds.com, 44, 49754784, Austria), (49, www.xnxx.com,
136, 50749572, Afghanistan), (49, www.shkodra.news, 28489, 6121, Albania), (49,
www.doubleclick.net, 228, 6255883, Algeria), (49, www.adbooth.com, 844, 2194687,
Andorra), (49, www.thepiratebay.org, 98, 47703675, Angola), (49,
www.microsoft.com, 47, 73327159, Argentina), (49, www.asekose.am, 54265, 51010,
Armenia), (49, www.watchfree.to, 835, 4809851, Aruba), (49, www.anz.com, 3086,
1353732, Australia), (49, www.imdb.com, 55, 68968551, Austria), (50,
www.klankosova.tv, 15106, 12752, Albania), (50, www.clicksgear.com, 99,
21449476, Algeria), (50, www.stackoverflow.com, 51, 76147434, Andorra), (50,
www.mozilla.org, 146, 10717045, Angola), (50, www.pelispedia.tv, 1664, 1994703,
Argentina), (50, www.ivideo.am, 114244, 17161, Armenia), (50, www.el-
nacional.com, 990, 599949, Aruba), (50, www.bing.com, 40, 78506210, Australia)]
Output
[(1, www.google.com, 1, 4192159833, Andorra), (1, www.youtube.com, 2,
2679159025, Algeria), (3, www.facebook.com, 3, 1082985733, Algeria), (5,
www.reddit.com, 9, 649176107, Australia), (5, www.amazon.com, 11, 399350988,
Aruba), (6, www.wikipedia.org, 5, 397197324, Aruba), (5, www.yahoo.com, 6,
383352336, Afghanistan), (7, www.vk.com, 16, 237979961, Armenia), (5,

www.twitter.com, 12, 206597988, Andorra), (4, www.live.com, 15, 186702228,
Aruba), (19, www.ebay.com, 36, 182446176, Aruba), (11, www.instagram.com, 17,
156961142, Andorra), (17, www.xvideos.com, 62, 146500484, Angola), (10,
www.aliexpress.com, 42, 144346819, Armenia), (10, www.linkedin.com, 24,
135219383, Andorra), (11, www.yandex.ru, 28, 121682061, Armenia), (26,
www.tumblr.com, 48, 108349851, Australia), (23, www.xhamster.com, 75, 91325643,
Austria), (17, www.imgur.com, 36, 82710984, Australia), (33, www.bing.com, 40,
78506210, Andorra)]
Test 1:
Web Data
Input a filename: Web_Scrapped_websites.csv
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: 1
Top 20 by Country
Country: United States
Website Traffic Rank
www.google.com 1
www.youtube.com2
www.facebook.com 3
Average Daily Page Views
4,186,314,171
2,679,159,025
1,082,985,733
649,176,107
399,350,988
397,197,324
383,352,336
206,597,988
182,446,176
58,866,838
82,710,984
216,392
135,219,383
156,961,142
71,173,494
4,184,263
186,702,228
63,635,667
40,406,853
45,278,239
www.reddit.com
www.amazon.com
www.wikipedia.org
www.yahoo.com
www.twitter.com
www.ebay.com
www.netflix.com
www.imgur.com
www.ntd.tv
www.linkedin.com
www.instagram.com
www.craigslist.org
www.diply.com
www.live.com
www.twitch.tv
www.microsoftonline.com
www.office.com
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: q
9
11
5
6
12
36
31
36
37
24
17
95
63
15
54
62
56

Test 2:
Web Data
Input a filename: Web_Scrapped
Error: file not found.
Input a filename: Web_Scrapped_websites
Error: file not found.
Input a filename: Web_Scrapped_websites.csv
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: 1
Top 20 by Country
Country: Afghanistan
Website Traffic Rank
www.google.com 1
www.youtube.com2
www.facebook.com 3
www.yahoo.com6
www.acbar.org 8011
www.bbc.com104
www.wikipedia.org5
www.jobs.af17003
www.live.com15
www.espncricinfo.com 323
www.blogfa.com 560
www.azadiradio.com 27256
www.msn.com 39
www.varzesh3.com 173
www.ask.com112
www.myway.com225
www.booksecure.net 11228
www.savefrom.net 118
www.onclkds.com 44
www.instagram.com 17
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: q
Average Daily Page Views
4,192,159,833
2,679,159,025
1,082,985,733
383,352,336
712,760
24,690,228
397,197,324
169,355
186,702,228
9,588,936
3,486,885
190,779
71,019,660
14,896,181
21,567,415
11,127,268
503,985
13,229,654
49,754,784
156,961,142

Test 3:
Web Data
Input a filename: Web_Scrapped_websites.csv
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: 2
Search: yyy
Websites Matching Query
None found
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: 2
Search: SPORTSWebsites Matching Query www.sportsport.ba www.themasports.com
www.sports.ru
www.skysports.com
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: 2
Search: sky
Websites Matching Query
www.sky.mk
www.skyband.mw
www.skysports.com
www.blogsky.com
www.sky.it
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: q

Test 4:
Web Data
Input a filename: Web_Scrapped_websites.csv
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: 3
Top 20 by Page View
Website
www.google.com
www.youtube.com
www.facebook.com
www.baidu.com
www.reddit.com
www.amazon.com
www.wikipedia.org
www.yahoo.com
www.qq.com
www.taobao.com
www.sohu.com
www.vk.com
www.twitter.com
www.jd.com
www.live.com
www.ebay.com
www.weibo.com
www.sina.com.cn
www.instagram.com
www.360.cn
Ave Daily Page Views
4,192,159,833
2,679,159,025
1,082,985,733
741,476,028
649,176,107
399,350,988
397,197,324
383,352,336
314,024,840
258,029,555
251,158,338
237,979,961
206,597,988
201,418,937
186,702,228
182,446,176
179,318,234
168,190,966
156,961,142
151,628,258
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: 4
Incorrect input. Try again.
Choose
(1) Top sites by country
(2) Search by web site name
(3) Top sites by views
(q) Quit
Choice: Q

Grading Rubric
General Requirements:
( 4 pts ) Coding Standard 1-9 (descriptive comments, source code
headers, function headers, etc.)
Implementation:
( 2 pts ) Open_file function (No Mimir test)
( 5 pts ) read_file function
( 5 pts ) remove_duplicate_sites function
( 5 pts ) top_sites_per_country function
( 5 pts ) top_sites_per_view function
_( 4 pts ) Test 1
( 5 pts ) Test 2
_( 5 pts ) Test 3
_( 5 pts ) Test 4

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] lisp python CSE 231
$25