Name: [Solved] CSCI5408-Data Management,WarehousingAnalytics -Assignment #2
Brand: Assignment Chef
SKU: [Solved] CSCI5408-Data Management,WarehousingAnalytics -Assignment #2
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

Objective:

The objective of this assignment is to understand research and industry problems related todistributed database operations, and transactions management.

Assignment Rubric

	Excellent(25%)	Proficient (15%)	Marginal (5%)	Unacceptable(0%)	This RubricApplied to
Completeness	All required	Submission	Some tasks are	Incorrect and
includingCitation	tasks arecompleted	highlights taskscompletion.However, missedsome tasks inbetween, whichcreated adisconnection	completed,which aredisjoint innature.	irrelevant	Problem #1
Correctness	All parts of the	Most of the given	Most of the	Incorrect and
	given tasks are	tasks are correct	given tasks are	unacceptable
	correct	However, someportions needminormodifications	incorrect. Thesubmissionrequires majormodifications.		Problem #2 Task 2

Novelty	The submissioncontains novelcontribution inkey segments,which is a clearindication ofapplicationknowledge	The submissionlacks novelcontributions.There are someevidences ofnovelty,however, it is notsignificant	The submissiondoes not containnovelcontributions.However, thereis an evidence ofsome effort	There is nonovelty	Problem #2 Tas k 2
Clarity	The written or	The written or	The written or	Failed to prove
	graphical	graphical	graphical	the clarity. Need
	materials, and	materials and	materials, and	proper
	developed	developed	developed	background
	applications	applications do	applications fail	knowledge to	Problem #2
	provide a clearpicture of theconcept, andhighlights theclarity	not show clearpicture of theconcept. There isroom forimprovement	to prove theclarity.Backgroundknowledge isneeded	perform the tasks	Task 1

Citation:

McKinney, B. (2018). The impact of program-wide discussion board grading rubrics on students

and faculty satisfaction. Online Learning, 22(2), 289- 2 9 9 .

Problem #1: This problem contains two reading tasks.

Reading Material #1: To retrieve the paper, visit IEEE database through libraries.dal.ca

M . Sharma and G. Singh, Analysis of Joins and Semi-joins in Centralized and Distributed Database

Queries, 2012 International Conference on Computing Sciences, Phagwara, 2012, pp. 15-20, doi:

10.1109/ICCS.2012.15.

Reading Material #2: To retrieve the paper, visit IEEE database through libraries.dal.ca

Kate, A. Jaiswal and A. Gehlot, A survey on distributed deadlock and distributed algorithms to

detect and resolve deadlock, 2016 Symposium on Colossal Data Analysis and Networking (CDAN),

Indore, 2016, pp. 1-6, doi: 10.1109/CDAN.2016.7570873.

Read the papers and perform the following:

Write a summary ( @ 1 page/ paper) on the paper in your own words. (you do not

need to add images/figures/tables from the paper. However, you can add your own

block diagrams or flowcharts to support the summary you have written)

What is the central idea of discussion?
Did you find any topic of interest in this paper? If Yes, what are those, and why do you

think those are interesting? If No, then as per you, what are the shortcomings of this

paper?

Submission Expectations: 1 page Report for each paper (total 2 pages) containing the summary and

analysis

Problem #1 Submission Requirements: A single PDF file (2 pages for two summaries)

Problem #2: This problem contains two tasks. 1 logical task + 1 Programming task

Research and Development: You need to simulate a distributed DBMS

Visit the website and extract the following datasets :

htt ps ://www. kagg l e . co m/o list b r/b ra zilia n-ecommerce?select=olist order payments dataset.csv

Problem Scenario: A company Data5408 has two branches, VM1andVM2. Assume that the datasets

you received from Kaggle are data of Data5408. In this question, you need to perform two tasks:

Task 1: Build Distributed Database

If the datasets are converted to database tables, and database(s), how will it be placed, state

the reasons? (E.g. why did you consider specific Fragmentation, transparency etc.)

You need to create two MySQL instances in two GCP Virtual Machines {VM1, and VM2}. Your

VM1site is responsible for storing customer, geolocation, user related information. VM2 site

is responsible for storing all remaining information such as, item, product, payments etc.

[Note: If you experience issues in handling large datasets, then consider a random reasonable

size (<1000 data points) subset of the given data.]

If required, please perform data cleaning, decomposition of dataset etc. before creating the

database and record your logic in the PDF. Cleaning using spreadsheet is sufficient

Since Data5408 implemented a distributed database, it should create and maintain a Global

Data Catalogue or Global Data Dictionary. How will you create it? Where will it be placed?

[Hint: Global data dictionary (GDD) is an additional component, which does not eliminate the

need of local data dictionaries. GDD usually contains information on databases, tables that

are located at different sites, and connected using the network.]

You do not have to write SQL script for this part, you can use import statement to upload your

clean table on VM1 and VM2 database.

Problem #2 Task 1 Submission Requirements:

A single PDF file with data cleaning, formatting logic or screenshots
Screenshots of VM1, VM2 MySQL instances
SQL dump {structure and value} taken from VM1, and VM2

Task 2: Perform Concurrent Remote Transactions (programming needed) on a single DBMS (VM1

MySQL)

Write a simple DBMS Transaction processing logic using Java program*, and run the program

on your local machine (TP). This program will access VM1 MySQL instance (DP) and execute

concurrent remote transactions.

Your program will perform three concurrent execution of transactions written in SQL.
Your program will also create a simple text file, which will act as a Transaction Log.
The details of the transactions are given below:

You must follow the sequence. Write your observation on how MySQL handled this particular

case

Tab 1: The table shows the sequence of Transactions entering the system for the execution

T 1 T2 T 3

Sequence 1	Read customers datawhere zip code =01151	Read customers datawhere zip code = 01151
Sequence 2	Update retrievedcustomers city toT 1 City		Read customers datawhere zip code =01151
Sequence 3		Update retrievedcustomers city toT 2 City	Update retrievedcustomers city toT 3 City
Sequence 4	Commit		Commit
Sequence 5		Commit

Modify your program and add one or two new method(s) to create exclusive locks for the

data. A transaction must obtain a lock based on the sequence of arrival (Tab 1), and must

release after operation.

* You can only use standard libraries.

Problem #2 Task 2 Submission Requirements:

Upload your program code before adding the Locking logic and after the modification withthe locking logic to gitlab (https://git.cs.dal.ca). Provide screenshots of your concurrent transaction testing with locking and withoutlocking logic

Assignment Submission Instructions:

Two PDF files Problem #1, Problem 2 (Task 1)
Two SQL Dump Files related to Problem 2 (Task 1) files

with .SQL extension

Program code (before and after modification) for Problem

#2 (Task 2) should be in gitlab.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSCI5408-Data Management,WarehousingAnalytics -Assignment #2

Reviews

Related products

[Solved] CSCI5408 Assignment #5 (Data Management, Warehousing, Analytics)

[Solved] CSCI5408 -Data Management, Warehousing- Assignment #3

[Solved] CSCI5408-Building a simple custom relational database (RDb)-Final Project

[Solved] CSCI 5408 -Data Management, Warehousing, Analytics -Assignment #1

[Solved] CSCI5408-Data Management, Warehousing, Analytics- Assignment #4