Imperial College London
Department of Mathematics
MSc in Mathematics and Finance Academic year 20192020, Autumn term
MATH97231 Deep Learning Coursework (weight: 10%), 10 December 2019
General rules:
This coursework is to be completed individually. You are welcome to dis- cuss issues related to the use of deep learning software with your class- mates, but all analysis and testing should be your independent work.
Present the results in a written report (at most 7 pages). Include your code in an appendix (which may extend beyond 7 pages). Alternatively, the re- port can be a Jupyter notebook, but it should be submitted as a PDF not as a .ipynb file.
Youmayuseanypubliclyavailable(freeorcommercial)softwareandpack- ages. Please indicate in your report, which software (and packages) you have used.
There are two deliverables: the report/notebook (as discussed above) and a set of predictions (details given below). Hand in these two files via email to [email protected] (two separate files, no .zip files or similar).
Deadline:Friday,20December2019,4:00pmUKtime.
In this coursework you will use deep learning to predict high-frequency price changes of an undisclosed US stock. The compressed file DL-2019-CW-data.zip contains two CSV files Dataset_A.csv and Dataset_B_nolabels.csv.
Firstly, Dataset_A.csv contains a 100 000 14 array with the following informa- tion:
Column 1: the label midprice change direction (recall that midprice = bid price+ask price ) coded as follows: 0 down, 1 up.
2
Columns214:thefeatures,allrecordedpriortothemidpricechangecor- responding to the label.
1
Column2:Sellside,limitorderbooklevel1,Price(inUSdollarsmul- tiplied by 10 000), that is, the ask price.
Column 3: Sell side, limit order book level 1, Volume (in number of shares).
Column 4: Buy side, limit order book level 1, Price, that is, the bid price.
Column5:Buyside,limitorderbooklevel1,Volume.
Column6:Sellside,limitorderbooklevel2,Price.
Column7:Sellside,limitorderbooklevel2,Volume.
Column8:Buyside,limitorderbooklevel2,Price.
Column9:Buyside,limitorderbooklevel2,Volume.
Columns1014:fivepreviousmidpricechangedirections(0/1-coded
like the labels).
The rows of this file have been randomly drawn from a larger data set, and they can be treated as 100000 iid samples. No time series structure can be recovered from the data.
Secondly, Dataset_B_nolabels.csv contains a 10 001 13 array with further 10 001 samples (drawn similarly as those in Dataset_A.csv) but with labels omit- ted.
In the coursework you are asked to do the following:
(A) Buildandtrainabinaryclassifierthatpredictsthelabelinthefirstcolumn of Dataset_A.csv. Style is free, but your approach should use neural net- works in a meaningful way. [6 marks]
(B) Use the binary classifier created in part (A) to predict the labels missing from Dataset_B_nolabels.csv. That is, you are asked to produce 10 001 predictions of the form 0/1. [4 marks]
Your solution to part (B) (set of predictions) should be a text file with 10 001 rows containing 0s and 1s. Name this file as [your CID]_[your surname].txt. For example, a fictionary person Damiano Brigo with CID 00123456 should name his file as 00123456_Brigo.txt. Please adhere to this format carefully, as your solutions will be processed automatically.
Your solution to (B) will be marked based on accuracy, defined as Number of correctly predicted labels .
Total number of labels
Hint: It is a good idea not to use the entire data set in Dataset_A.csv to do the training, but instead split it into training and validation sets.
2
Reviews
There are no reviews yet.