5/5 - (1 vote)

In this homework we will develop a statistical language model of Turkish that will use N-grams of Turkish syllables.

Follow the steps below for the rest of the homework and for your homework report

Download the Turkish Wikipedia dump https://www.kaggle.com/mustfkeskin/turkish wikipedia dump
Separate each word into its syllables using the same program that you used for HW1
Calculate the 1-Gram, 2-Gram, 3-Gram, 4-Gram and 5-Gram tables for this set using 95% of the set (If the set is too large, you may use a subset). Note that your N-gram tables will be mostly empty, so you need to use smart ways of storing this information. You also need to use smoothing, which will be GT smoothing that we have learned in the class.
Calculate perplexity of the 1-Gram to 5-Gram models using the chain rule with the Markov assumption for each sentence. You will use the remaining 5% of the set for these calculations. Make a table of your findings in your report and explain your results.
Produce random sentences for each N-Gram model. You should pick one of the best 5 letters

randomly. Include these random sentences in your report and discuss the produced sentences.

Prepare your report and submit it to the Teams page. You may use any programming language for the implementation. You may also use N-gram library software to calculate the N-Grams efficiently. Please indicate which library you have used.

Notes

Do not forget to use logarithm of the multiplication of the chain rule formula
Convert all the letters to small case letters first. You may convert all Turkish characters to English ones. For example, -> s and -> g
Do not forget to include punctuation marks (end of sentences and space characters as syllables in your N-grams. Just lower case letters and space character will be enough.
You will demo your homework result online

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSE654/484 Homework 02

Reviews

Whatsapp Us

[Solved] CSE654/484 Homework 02

Reviews

Related products

[Solved] CSE654/484 Homework 04

[Solved] CSE654/484 Homework 03

[Solved] CSE654/484 Homework 01