5/5 - (1 vote)

Download and unzip Project4_sentences.zip and Project4_code.zip files.

A set of sentences is given in the file sentences.txt. Each sentence is a line in the file. Create the feature vector by writing a program that applies the following text mining techniques to this set of sentences.

Tokenize sentences
Remove punctuation and special characters
Remove numbers
Convert upper-case to lower-case
Remove stop words. A set of stop words is provided in the file txt
Perform stemming. Use the Porter stemming code provided in the file txt
Combine stemmed words.
Extract most frequent words.

Provide the feature vector in your report.

Note:

The feature vector contains unique sets of words that appear in the set of sentences provided.

The file Project4_code.zip contains implementations of the Porter Stemmer in several languages. You can use any version of the code provided (provided versions of the code are Java, Matlab, Python, and C). Make sure you rename your file accordingly. More source code for the Porter Stemmer can be found here: http://tartarus.org/martin/PorterStemmer/

Page 1 of 2

CMSC 409: Artificial Intelligence

Project 4

Name: [Solved] CMSC409 Project 4-Feature vector
Brand: Assignment Chef
SKU: [Solved] CMSC409 Project 4-Feature vector
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

Using the feature vector generated in first task, write a program that generates the Term Document Matrix (TDM) for ALL the sentences in txt, similar to TDM below. Example TDM

Keyword set	anonymous	identify	car
Sentence 1	1	4	3
Sentence 2	2	0	1
..
Sentence 20	2	0	0

Provide the TDM in your report.
For each of the text mining steps (A to H), explain why they are used, and what sort of information is lost while applying each of the text-mining steps.
Write a program implementing the clustering algorithm of your choice (WTA or FCAN). Apply that algorithm to TDM to group similar sentences together.
1. How many clusters/topics have you identified?
2. What drives the dimensionality of TDM? What can you do to reduce that dimensionality? Does the order of data being fed to algorithm matter?
3. Show and comment the results.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CMSC409 Project 4-Feature vector

Project 4

Reviews

Related products

[Solved] CMSC409 Project 1-Understand and explore a data set

[Solved] CMSC409 Project 2- Perceptron-based classifier

[Solved] CMSC409 Project 3