COMS 4995 Piyush Jena (pj2400)
MidTerm Assignment
Q1. Implement an encoder-only transformer[1] with the following specifications:
Non-Linearity |
tanh |
Embedding Size |
64 |
Attention heads |
4 |
Encoder layers |
4 |
Your task is to use the model for the Part-Of-Speech Tagging problem. We will use a standard dataset CoNLL 2003[2]. The dataset is split into training set, testing set and validation set and is provided to you in csv format for simplicity. In total, there are 46 Parts of Speech in the dataset. We will consider just 4 classes for the classification.
-
Noun: NN, NNS, NNP, NNPS, NN|SYM, PRP, PRP$
-
Verb: VB, VBD, VBG, VBN, VBP, VBZ
-
Adjective/Adverb: JJ, JJR, JJS, RB, RBR, RBS
-
Others: Any remaining POS
To reduce the number of parameters, we will be using pre-trained word vectors from word2vec[3] of size 64 (Embedding Size). The word vectors are also pro- vided in csv format.
Initialization:
Random Initialization in high dimensional spaces can lead to issues with conver- gence. This is why we will use He-initialization[4] for initialization of weights. Biases are to be initialized to 0.
Training:
While Stochastic Gradient Descent works, it requires 4x epochs as compared to Adam Optimizer. The bonus goal is to implement Adam Optimizer.
Result:
Report the accuracy, precision and recall for each of the classes.
References:
-
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. ”Attention is all you need.” Advances in neural information processing systems (2017).
-
Erik F. Tjong Kim Sang and Fien De Meulder. ”Introduction to the CoNLL- 2003 shared task: Language-independent named entity recognition.” In Proceed- ings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142– 147. (2003)
-
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. ”Efficient esti- mation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013).
-
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. ”Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.” In Proceedings of the IEEE international conference on computer vision, pp. 1026- 1034. 2015.
COMS 4995 2 HW 3
Reviews
There are no reviews yet.