[Solved] 40.016 Week13 -Kaggle competition

$25

File Name: 40.016_Week13_-Kaggle_competition.zip
File Size: 310.86 KB

SKU: [Solved] 40.016 Week13 -Kaggle competition Category: Tag:
5/5 - (1 vote)

Participatory monitoring is a novel form of data collection based on observations gathered by local residents. Such data collection process can rely on a variety of tools, ranging from amateur equipment to social media (e.g., Twitter, Facebook), on which people post comments and observations concerning various environmental and socio-economic processes. Participatory monitoring is quickly permeating multiple scientific domains as an alternative, or addition, to professional scientist-executed monitoring. During a flooding event, for example, agencies, utilities, and service providers could rely on both radar data and tweets to pinpoint the most affected areas. Are tweets reliable? How can one extract useful information from tweets? This is where data analytics can give us an edge.

In this project, you are tasked with the problem of inferring automatically the information contained in a large amount of tweets concerning the state of the weather. Specifically, your task is to develop an algorithm that determineswith the highest accuracywhat sort of weather the tweets reference. The analysis must be carried out in the R computing environment. You can use any R package.

2 Schedule

The schedule of events for this data competition is outlined in Table 1.

December 3, 2020 Announcement of the Data Competition
December 3, 2020 Publication of problem details and competition rules
(17.00) + Release of the training dataset+ Release of the test dataset
December 11, 2020 Last opportunity for submitting the results on Kaggle
(23.59)December 13, 2020(23.59) Submission of reports, code, and peer evaluation form

Table 1: Schedule of events.

Other info about the data competition will be published on Kaggle, which will act as a portal for downloading the data and uploading the predictions for the test dataset. Reports, code, and peer evaluation form should be submitted on eDimension through a dedicated link.

1

3 Problem description

As mentioned in Section 1, your task is to develop an algorithm that determines what sort of weather the tweets reference. Specifically, the challenge is to determine whether a tweet has a negative, neutral, or positive sentiment. The following datasets are provided:

  • csv: 22,500 tweets with the corresponding classification / sentiment. The integers 1, 2, and 3 indicate negative, neutral, and positive sentiment, respectively.
  • csv: 7,500 tweets. Naturally, this dataset has no labels. It will be used to quantify the performance of the algorithms.

The performance of the algorithms will be then evaluated based on their capability of classifying correctly the sentiment of each tweet in the test dataset. In particular, the evaluation will be based on the accuracy metric, defined as the ratio between the number of correctly-classified samples and the total number of samples. Kaggle will calculate the value of the accuracy on two subsets of the test dataset, named public and private. The results on the public dataset will be available during the competition (public leaderboard), while the results on the private one will be available at the end of the competition (private leaderboard).

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] 40.016 Week13 -Kaggle competition
$25