Name: [Solved] BIM207-filename and topN
Brand: Assignment Chef
SKU: [Solved] BIM207-filename and topN
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

Your program takes two arguments: filename and topN
You should read the given text file and preprocess the text according to following order: Tokenize the text by whitespace(not just space character, e.g. more than one space, tab, newline etc.), remove punctuations, and apply the lowercase.
You are asked to calculate followings:

Average Term Length By Initial Character: For example, If your tokens are [apple,banana,avocado,blueberry], then your output should be like

a = 6 b = 7.5

Total Minimum Distance: For each term pair, calculate the following formula

f(t₁) * f(t₂)

1+ln d(t₁,t₂)

where f(t) is the count of the term t in the text and d(t ₁,t₂) gives the minimum distance between t ₁ and t2 where t ₁ is followed by t ₂. For example, If the text is

aa bb cc aa cc dd bb and t ₁ = aa and t ₂ = bb, then d(t₁,t₂) = 1+3 = 4. You

should print only topN pairs according to the score.

Important !

Make sure the following commands are running mvn clean package

java -jar targetbim207hw.jar sampleText.txt 10

Sample Output

InitialCharacter AverageLength

1 3.5 2 2.0 3 5.0

5 1.0

7 4.0

285714285714286
0
333333333333333
0 f 6.0 g 7.125 h 5.375 i 6.0

k 9.266666666666667 m 5.857142857142857

o 8.0 p 8.5 r 6.0

s 7.214285714285714 t 6.363636363636363

0
4285714285714284 y 10.0 z 7.5

11.666666666666666 11.090909090909092 12.666666666666666

Pair{t1=yerlekesindeki, t2=ve, factor=26.0}

Pair{t1=ve, t2=sayl, factor=15.356018837890671}

Pair{t1=tarih, t2=ve, factor=13.0}

Pair{t1=donanml, t2=ve, factor=13.0}

Pair{t1=rencileri, t2=ve, factor=13.0}

Pair{t1=syleilere, t2=ve, factor=13.0}

Pair{t1=yaratc, t2=ve, factor=13.0}

Pair{t1=eden, t2=ve, factor=13.0}

Pair{t1=ve, t2=30425, factor=13.0}

Pair{t1=kltrel, t2=ve, factor=13.0}

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] BIM207-filename and topN

Reviews