Use the 20 newsgroups dataset to build a unigram inverted index.
- Provide support for the following commands:
- x OR y
- x AND y
- x AND NOT y
- x OR NOT y
Where x and y would be taken as input from the user.
Your query output should be:
- the number of docs retrieved
- the minimum number of total comparisons done (if any) the list of documents retrieved.
* Note that the queries can be of more than 2 words of the form: x OP1 y OP2 z where OP1, OP2 = AND, OR, NOT. Try to write generalized code where the number of words in query can be variable.
- Provide support for searching for phrase queries using Positional Indexes. (For this question,
build index only on comp.graphics and rec.motorcycles)
You may assume phrase query length to be of length less than equal to 5.
Reviews
There are no reviews yet.