5/5 - (1 vote)

Objective:

The objective of this assignment is to understand BI framework, creating star/snowflake schema, and concept of sentiment and semantic analysis.

Assignment Rubric

	Excellent(25%)	Proficient (15%)	Marginal (5%)	Unacceptable(0%)	Problem #whereapplied
Completeness	All required	Submission	Some tasks are	Incorrect and	Problem # 3
includingCitation	tasks arecompleted	highlights taskscompletion.However, missedsome tasks inbetween, whichcreated adisconnection	completed,which aredisjoint innature.	irrelevant
Correctness	All parts of the	Most of the given	Most of the	Incorrect and	Problem #2
	given tasks arecorrect	tasks are correctHowever, someportions need	given tasks areincorrect. Thesubmission	unacceptable

		minormodifications	requires majormodifications.
Novelty	The submission	The submission	The submission	There is no	Problem #1
	contains novelcontribution inkey segments,which is a clearindication ofapplicationknowledge	lacks novelcontributions.There are someevidences ofnovelty,however, it is notsignificant	does not containnovelcontributions.However, thereis an evidence ofsome effort	novelty
Clarity	The written or	The written or	The written or	Failed to prove	Problem #1
	graphical	graphical	graphical	the clarity. Need
	materials, and	materials and	materials, and	proper
	developed	developed	developed	background
	applications	applications do	applications fail	knowledge to
	provide a clearpicture of theconcept, andhighlights theclarity	not show clearpicture of theconcept. There isroom forimprovement	to prove theclarity.Backgroundknowledge isneeded	perform the tasks

Citation:

McKinney, B. (2018). The impact of program-wide discussion board grading rubrics on students and faculty

satisfaction. Online Learning, 22(2), 289-299.

Tasks

This assignment requires you to submit programming codes on gitLab, and a single PDF

file on Brightspace.

Problem #1

Business Intelligence Reporting using Cognos

1 . Download the weather dataset available on https://www.kaggle.com/PROPPG-PPG/hourly-

weat he r-surface-brazil-southeast-region?select=sudeste.csv

2 . Explore the dataset and identify data field(s) that could be measured by certain factors or dimensions. (Follow recorded lecture #18, and synchronous session #18)

Example: In a Sales dataset, you may find a measurable field total sales, which could be analyzed by other factors such as, products, time, location etc. These factors are known as dimensions. Depending on the data, you may also find possibilities of slice and dice, i.e. analysis could be possible in more granular level; From total sales by city to total sales by store

3 . Write 1/2 page explanation on how did you select the measurable filed, i.e. fact and what are the possible dimensions. Include this part in your PDF file.

Clean the dataset, if required perform formatting. You can perform the cleaning and formatting using spreadsheet operation or programming script. If you use program add that in GitLab, if you use other methods, write the steps in the PDF file.

5 . Create Cognos account and import your dataset. Identify the dimensions, and create/import the dimension tables.

6 . Based on your understanding of the domain (please read the information/metadata available on the dataset source, i.e. Kaggle), create star schema or snowflake schema. Provide justification of your model creation in the PDF file.

In addition to justification, attach screenshot of the model (star schema or snowflake schema) in the PDF file.

Display visual analysis of the data in a suitable format, e.g. bar graph showing temperature change in terms of a suitable dimension. Add the screenshot of the analysis on the pdf or add a screen recording of the analysis on your .zip folder.

Problem #2

Sentiment Analysis Java Program only

1 . To perform this task, you need to consider the processed news (content or descriptions only,

ignore other fields) that you obtained and stored in MongoDB in your previous assignment. If you could not perform/complete the task, then obtain the processed MongoDb News collection by contacting your TA Kethan (C c me in that email)

2 . Write a script to create bag-of-words for each news article. (code from online or other sources are not accepted)

e.g. news 1 = Canada is cold cold. I feel good not bad

bow1 = {Canada:1, is:1, cold:2, I:1, feel:1, good : 1, not : 1, bad : 1}

You do not need any libraries. Just implement a simple counter using loop.

Compare each bag-of-words with a list of positive and negative words. You can download list of

positive and negative words from online source(s). You do not need any libraries. Just perform word by word comparison with a list of positive and negative words that you can get from any online platform. E.g. negative words can be found here https://gist.github.com/mkulakowski2/4289441

3 . Tag each news as positive, negative, or neutral based on overall score. You can add an additional column to present your finding.

News Article	NewsDescription/content	match	polarity
1	Canada is cold cold. Ifeel good not bad	cold, good, not, bad	negative

Problem #3

Semantic Analysis

1 . For this task, consider the processed news collection that you created in Assignment 3 .

2 . Use the following steps to compute TF-IDF (term frequency-inverse document frequency)

Suppose, you have 50 news articles (description or content only) that are stored in 50 JSON arrays. You need to consider these data points as the total number of

documents (N). In this case N= 5 0

Now, use the search query Canada, Moncton, Toronto, and search in how many documents these words have appeared.

Total Documents	5 0
Search Query	Documentcontainingterm(df)	Total Documents(N)/ numberof documents term appeared(df)	Log10(N/df)
Canada	30	50/30	0.221848749
Moncton	5	5 0/ 5	1
Toronto	10	50/10	0.698970004

Once you build the above table, you need to find which document has the highest occurrence of the word Canada. You can find this by performing frequency count of the word per document.

Term	Canada
Canada appeared in 30 documents	Total Words (m)	Frequency (f)
Article # 1	6	2
Article # 2	10	1
:	:	:
Article #3 0	8	1

You should print the news article (programmatically), which has the highest relative frequency. You can find this by computing (f/m) .

Assignment 4 Submission Format:

Compress all your reports/files into a single .zip file and give it a meaningful name.

You are free to choose any meaningful file name, preferably BannerId_Lastname_firstname_5408_A4 but avoid generic names like assignment-4.

Submit your reports only in PDF format.

Please avoid submitting .doc/.docx and submit only the PDF version. You can merge all the reports into a single PDF or keep them separate. You should also include output (if any) and test cases (if any) in the

PDF file in the proper format (e.g. tables, charts etc. as required).

Your executable code/script needs to be submitted on https://git.cs.dal.ca/

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSCI5408-Data Management, Warehousing, Analytics- Assignment #4

Reviews

Whatsapp Us

[Solved] CSCI5408-Data Management, Warehousing, Analytics- Assignment #4

Reviews

Related products

[Solved] CSCI5408 Assignment #5 (Data Management, Warehousing, Analytics)

[Solved] CSCI5408-Building a simple custom relational database (RDb)-Final Project

[Solved] CSCI 5408 -Data Management, Warehousing, Analytics -Assignment #1

[Solved] CSCI5408-Data Management,WarehousingAnalytics -Assignment #2

[Solved] CSCI5408 -Data Management, Warehousing- Assignment #3