5/5 - (1 vote)

Participatory monitoring is a novel form of data collection based on observations gathered by local residents. Such data collection process can rely on a variety of tools, ranging from amateur equipment to social media (e.g., Twitter, Facebook), on which people post comments and observations concerning various environmental and socio-economic processes. Participatory monitoring is quickly permeating multiple scientific domains as an alternative, or addition, to professional scientist-executed monitoring. During a flooding event, for example, agencies, utilities, and service providers could rely on both radar data and tweets to pinpoint the most affected areas. Are tweets reliable? How can one extract useful information from tweets? This is where data analytics can give us an edge.

In this project, you are tasked with the problem of inferring automatically the information contained in a large amount of tweets concerning the state of the weather. Specifically, your task is to develop an algorithm that determineswith the highest accuracywhat sort of weather the tweets reference. The analysis must be carried out in the R computing environment. You can use any R package.

2 Schedule

Name: [Solved] 40.016 Week13 -Kaggle competition
Brand: Assignment Chef
SKU: [Solved] 40.016 Week13 -Kaggle competition
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

The schedule of events for this data competition is outlined in Table 1.

December 3, 2020	Announcement of the Data Competition
December 3, 2020	Publication of problem details and competition rules
(17.00)	+ Release of the training dataset+ Release of the test dataset
December 11, 2020	Last opportunity for submitting the results on Kaggle
(23.59)December 13, 2020(23.59)	Submission of reports, code, and peer evaluation form

Table 1: Schedule of events.

Other info about the data competition will be published on Kaggle, which will act as a portal for downloading the data and uploading the predictions for the test dataset. Reports, code, and peer evaluation form should be submitted on eDimension through a dedicated link.

3 Problem description

As mentioned in Section 1, your task is to develop an algorithm that determines what sort of weather the tweets reference. Specifically, the challenge is to determine whether a tweet has a negative, neutral, or positive sentiment. The following datasets are provided:

csv: 22,500 tweets with the corresponding classification / sentiment. The integers 1, 2, and 3 indicate negative, neutral, and positive sentiment, respectively.
csv: 7,500 tweets. Naturally, this dataset has no labels. It will be used to quantify the performance of the algorithms.

The performance of the algorithms will be then evaluated based on their capability of classifying correctly the sentiment of each tweet in the test dataset. In particular, the evaluation will be based on the accuracy metric, defined as the ratio between the number of correctly-classified samples and the total number of samples. Kaggle will calculate the value of the accuracy on two subsets of the test dataset, named public and private. The results on the public dataset will be available during the competition (public leaderboard), while the results on the private one will be available at the end of the competition (private leaderboard).

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] 40.016 Week13 -Kaggle competition

2 Schedule

3 Problem description

Reviews

Related products

[Solved] 40.016 Week2-Test your knowledge of Linear Regression and PCA in R

[Solved] 40.016 Week5-Test your knowledge of Discrete Choice and Model Selection in R

[Solved] 40.016 Week4-Test your knowledge of Discrete Choice and Model Selection in R

[Solved] 40.016 Week3-Test your knowledge of Logistic Regression in R

[Solved] 40.016 Week1-Test your knowledge of R