In this assignment, we’re interested in the main topics discussed on the /r/mcgill subreddit vs. the /r/concordia subreddit. We’ll do this using human annotation … and you’re the annotatorFirst, let’s collect some reddit posts (using the /new.json endpoint – details here). We’ll collect two data files. One from the McGill subreddit and one from the Concordia subreddit. For the purpose of this assignment, collect them manually. Meaning, in a web browser, get the json dump and download it to a file. You should have a a mcgill.json file and a concordia.json file.Write a script extract_to_tsv.py that accepts one of the files you collected from Reddit and outputs a random selection of posts from that file to a tsv (tab separated value) file. It should function like this: python3 extract_to_tsv.py -o If is greater than the file length, then the script should just output all lines. If there are more than (which is likely the case), then it should randomly select num_posts_to_output (the parameter you passed to the script) of them and just output those. The output format (written to out_file) is: Name title coding
Reviews
There are no reviews yet.