A large entertainment & bar franchise (Hard Knocks) would like to determine whether now is a good time to expand and which country/region of the world is best to open next. Ultimately, they would like to gain some insights on whether to open their next franchise, based on public sentiments generally and opportunities to maximize profits. They first want to test the pulse of persons in relation to the products they sell (Beverages) which would indicate the current public views towards these products. In addition, they have a large dataset with sales data by region and country for different types of consumer goods (household, cosmetics etc), food and beverage. You will be helping Hard Knocks to make their decision in this project! The tasks required have been broken down in several segments and you will be required to start this week!
- Between March 04 & March 09 (max. 1 word per day) connect to twitter on two separate days and retrieve 8,000 or more tweets containing one of the words from a and one of the words from b (total 16,000 tweets). Retrieve tweets for the word from a on a separate day from b. a) beverage or beer
- b) party or concert [4]
- For each set of tweets retrieved (a-b above), retain the following features only: text, screen_name, user_id, created_at, favourite_count, retweet_count, location, followers_count, friends_count, account_lang, lang.
- Remove all non-English tweets (you must indicate how many tweets were removed). [1]
- A tweet is considered a duplicate if the text is the same as another tweet. Remove all duplicate tweets
(you must indicate how many tweets were removed). [2]
- Write the remaining tweets data to a file (.csv). The csv filename should have the format
<keyword>_<date>_<myname>. For example, for tweets on beverage retrieved on March 07 by
Anderson would be: beverage_2021Mar07_Anderson.csv [1]
- Write code to review and show details of tweets retrieved including number of tweets (after doing 2a-c), screen_name with the most followers, tweet with the most retweets, location from which the most tweets originate. [7]
- Between March 10 & 14, repeat tasks 1 & 2 above. For task 1, use the other words that were not used during March 04-09 tweet retrieval. That is, if you retrieved tweets using beverage, you should use beer and if you used party now use concert for part 1b.
Save files using same format (the names will be different given that dates will be different).
Reviews
There are no reviews yet.