Covid-19 Fact Checking
In this project, you will build a Naive Bayes bag-of-Word (NB-BOW) approach to determine if a tweet contains a verifiable factual claim. Then, you will compare the output of the NB model to a Word2Vec LSTM approach given to you (see Section 3).
1 The Dataset
Download the Assignment 3 dataset available on Moodle. This dataset, created by [Alam et al., 2020], contains a collection of tweets related to Covid-19 collected from March 910, 2020 and March 2025, 2020. These tweets have been labeled for 7 different questions, such as:
Q1 Does the tweet contain a verifiable factual claim?
Q2 To what extent does the tweet appear to contain false information?
Q6 Is the tweet harmful for society and why?
For example, an instance of the dataset contains:
tweet id | text | q1 label | q2 label | q3 label | q4 label | q5 label | q6 label | q7 label |
1240716889162018816 | Can yall please just follow the governments instructions so we can knock this COVID-19 out and be done?! I feel like a kindergartner that keeps losing more recess time because one or two kids cant follow directions. | no | NA | NA | NA | NA | no not harmful | no not interesting |
In this assignment, we will only use the classification of the 1st question (q1 label) whose labels are binary (yes/no):
- yes the tweet contains a verifiable factual claim
- no the tweet does not contain a verifiable factual claim
The dataset is already split into a training set of 400 instances and a test set of 55 instances.
2 The Naive Bayes Classifier (NB-BOW)
You will code a Multinomial Naive Bayes classifier yourself.
2.1 Parameters
Your Naive Bayes Classifier should use the following parameters:
Vocabulary: First fold the training set in lower case, then build a list of all words appearing in the training set. This list will constitute your vocabulary V which will be used as features. To identify the words, tokenise the tweets based on spaces only, use the words as features, and word frequencies as feature values.
Experiment with 2 versions of the model:
Original Vocabulary : one model where all words appearing in the training set are used as features. Lets call this model NB-BOW-OV.
Filtered Vocabulary : a second model where you filter out the words that appear only once in the training set, so V contains only the words that appear at least 2 times in the training set. Lets call this model NB-BOW-FV.
Smoothing: to smooth, use additive smoothing (add-) with = 0.01.
Log: To avoid arithmetic underflow, work in log10 space.
2.2 Output
For both your models (NB-BOW-OV & NB-BOW-FV), your program should create 2 output files: a trace file (see Section 2.2.1) and one overall evaluation file (see Section 2.2.2).
2.2.1 Trace Files
Given a test set, your program should create trace files called trace NB-BOW-OV.txt and trace NB-BOW-FV.txt,
The trace file should contain:
- the tweet ID as indicated in the test file, followed by 2 spaces
- the most likely class as determined by your model (i.e. the label yes, no), followed by 2 spaces
- the score of the most likely class (in scientific notation), followed by 2 spaces
- the correct class as indicated in the test file, followed by 2 spaces
- the label correct or wrong (depending on the case), followed by a carriage return.
For example the file trace NB-BOW-O.txt could contain:
1235714668833828864 yes -1.23E-7 no wrong1235545254347984897 no -3.21E-7 no correct |
2.2.2 Overall Evaluation Files
In addition to the trace file, create text files called eval NB-BOW-OV.txt and eval NB-BOW-FV.txt summarising the performance of the model with the initial test set given on Moodle. The file should indicate the models:
- accuracy (Acc), carriage return
- per-class precision (yes-P, no-P) separated by 2 spaces, then a carriage return
- per-class recall (yes-R, no-R) separated by 2 spaces, then a carriage return,
- per-class F1-measure (yes-F, no-F) separated by 2 spaces, then a carriage return, For example the file eval NB-BOW-OV.txt could contain:
.6666.77770.5555.77770.5555.77770.5555 |
2.3 Programming Environment
You must use Python 3.8. In addition, you must use GitHub (make sure your project is private while developing).
3 The LSTM Classifier (LSTM-W2V)
Your wonderful TAs have implemented a classifier for the same task using an LSTM and Word2Vec embeddings (LSTM-W2V). This code generates the output files (see Section 4.1) for you, so it can be used to compare the performance of your NB-BOW approach.
- Download the embedding #6 from the http://vectors.nlpl.eu/repository. Note that the download is 606MB.
- Download the code available at https://gitlab.com/Feasinde/lstm-for-covid-disinformation.
- Place both downloads in the same folder.
- Run the code and generate the corresponding output files. Note that the code may take a good 5 to 10 minutes to run.
4 Deliverables
The submission of the assignment will consist of 3 deliverables:
- The code & output files
- The demo (8 min presentation & Q/A)
4.1 The Code & Output files
Submit all files necessary to run your code in addition to a readme.md which will contain specific and complete instructions on how to run your experiments. You do not need to submit the datasets. If the instructions in your readme file do not work, are incomplete or a file is missing, you will not be given the benefit of the doubt. Generate one output file for each model as indicated in Section 4.1.
4.2 The Demos
You will have to demo your assignment for 12 minutes. Regardless of the demo time, you will demo the program that was uploaded as the official submission. The schedule of the demos will be posted on Moodle. The demos will consist in 2 parts: a presentation 8 minutes and a Q/A part ( 4 minutes). Note that the demos will be recorded.
4.2.1 The Presentation
Prepare an 8-minute presentation to analyse and compare the performance of your models. The intended audience of your presentation is your TAs. Hence there is no need to explain the theory behind the models. Your presentation should focus on your work and the comparison of the performance of 3 models.
Your presentation should contain at least the following:
An analysis of the initial dataset given on Moodle. If there is anything particular about these datasets that might have an impact on the performance of some models, explain it. An analysis of the difference between the vocabulary of the NB-BOW-OV and NB-BOW-FV models. What is the size of V in each model? did the reduction in V lead to a significant difference in performance? Explain.
An analysis of the results of all 3 models. In particular, compare and contrast the performance of each model with one another.
In the case of team work, a description of the responsibilities and contributions of each team member.
Please note that your presentation must be analytical. This means that in addition to stating the facts (e.g. the F1 has this value), you should also analyse them i.e. explain why some metric seems more appropriate than another, or why your model did not do as well as expected. Tables, graphs and contingency tables to back up your claims would be very welcome here.
Any material used for the presentation (slides, ) must be uploaded on EAS before the due date.
4.2.2 Q/A
After your presentation, your TA will proceed with a 4 minute question period. Each student will be asked questions on the code/assignment, and he/she will be required to answer the TA satisfactorily. In particular, each member should know what each parameter that you experimented with represent and their effect on the performance. Hence every member of team is expected to attend the demo.
In addition, your TA may give you a new dataset and ask you to train or run your models on this dataset. The output files generated by your program will have to be uploaded on EAS during your demo.
5 Evaluation Scheme
Students in teams can be assigned different grades based on their individual contribution to project. Individual grades will be based on:
- a peer-evaluation done after the submission.
- the contribution of each student as indicated on GitHub.
- the Q/A of each student during the demo.
Code | functionality, proper use of the datasets, design, programming style, | 11 |
Output with initial datasets | correctness and format | 1.5 |
Demo Presentation | depth of the analysis, clarity and conciseness, presentation, time-management, | 4 |
Demo QA | correct and clear answers to questions, knowledge of the program, | 2 |
Output with demo-dataset | correctness and format | 1.5 |
Total | 20 |
The team grade will be based on:
Reviews
There are no reviews yet.