FINA 5240: FinTech Analytics Assignment 5
Halis Sak October 17, 2019
Question. In the previous homework assignment we used tf-idf method to make predictions on labels based on the content of whitepapers of ICOs (Initial Coin Offerings). Now we want to use feed-forward neural networks for the same classification problem. Please use Python to do the following tasks.
a) We need a tokenizer to split the content of the documents into words. Please use nltk package (follow the steps at https://pythonspot.com/tokenizing- words-and-sentences-with-nltk/ to download all the required packages). Af- ter completing the installation process for nltk package, create a new Pandas dataframe having columns [tok_content,label]. Tokenize the content of whitepa- pers in ICOData.csv using word_tokenize function of nltk package and store them in tok_content column of the new dataframe. The column label should
be the label of documents in ICOData.csv.
b) We can use gensim package for creating vectors for the tokens of our whitepapers. (We used this package for news articles in Mandarin in Lecture 3 and 4.) First, train a word2vec model for tokens of our whitepapers using gensim package. Then, find the most similar words to Bitcoin.
c) Construct a mapping for tokens in our dictionary to integers as we did in our lecture notes. Then, split the new dataframe into two groups; training and testing (df_train and df_test). The first 130 rows of the data should be in df_train and the rest should be in df_test.
d) We want to fit a two-layer feed-forward neural network to our data as we did in Lecture 3. Please set the parameters of the model. The maximum number of tokens, mlen, can be assigned to 3000. You are welcome to experiment with the hyperparameters of the model.
e) Finally, we want to train the model and compute the classification accu- racy. The Python code that I wrote for Lecture 3 can be used mostly without any change. However, data and target lines of the code should be changed.
1
Please make these changes appropriately and report the classification accuracy.
2
Reviews
There are no reviews yet.