[Solved] CSE508 Assignment 2

$25

File Name: CSE508_Assignment_2.zip
File Size: 178.98 KB

SKU: [Solved] CSE508 Assignment 2 Category: Tag:
5/5 - (1 vote)

Question 1: Download http://archives.textfiles.com/stories.zi p dataset

You need to implement a CLI tool for:

  • Jaccard Coefficient based document retrieval: For each query, your system will output top k documents based on jaccard score.
  • Tf-Idf based document retrieval: For each query, your system will output top k documents based on tf-idf-matching-score. Implement different versions of Tf-Idf based document retrieval then compare and analyze which performs better and why.
  • Tf-Idf based vector space document retrieval: For each query, your system will output top k documents based on a cosine similarity between query and document vector.

In addition, ensure that numerical queries work. Example 100 animals, 50,000 variety of flowers, population of 1 billion etc.

Give special attention to the terms in the document title and analyze the change in result with and without attention to terms in title.

Compare and state pros and cons for all the techniques.

Question 2: Download the dictionary from http://www.gwicks.net/dictionaries.ht m (UK

ENGLISH 65,000 words)

Take a sentence as input from user. For each non dictionary words present in the sentence suggest top k words on the basis of minimum edit distance. Cost of operations is defined as: Insert: 2

Delete: 1

Replace: 3

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSE508 Assignment 2
$25