Name: [Solved] COMP9318 Assignment 1
Brand: Assignment Chef
SKU: [Solved] COMP9318 Assignment 1
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

Consider the following base cuboid Sales with four tuples and the aggregate function SUM:

Location	Time	Item	Quantity
Sydney	2005	PS2	1400
Sydney	2006	PS2	1500
Sydney	2006	Wii	500
Melbourne	2005	XBox 360	1700

Location, Time, and Item are dimensions and Quantity is the measure. Suppose the system has built-in support for the value ALL.

List the tuples in the complete data cube of R in a tabular form with 4 attributes,

i.e., Location,Time,Item,SUM(Quantity)?

Write down an equivalent SQL statement that computes the same result (i.e., the cube). You can only use standard SQL constructs, i.e., no CUBE BY
Consider the following ice-berg cube query:

SELECT Location, Time, Item, SUM(Quantity)

FROM Sales

CUBE BY Location, Time, Item HAVING COUNT(*) > 1

Draw the result of the query in a tabular form.

Assume that we adopt a MOLAP architecture to store the full data cube of R, with the following mapping functions:

= Sydney, = Melbourne,

0 if x = ALL.

if x = 2005, if x = 2006,

0 if x = ALL.

1 if x = PS2,

2f_Item(x) =30

if x = XBox 360, if x = Wii, if x = ALL.

Draw the MOLAP cube (i.e., sparse multi-dimensional array) in a tabular form of (ArrayIndex,V alue). You also need to write down the function you chose to map a multi-dimensional point to a one-dimensioinal point.

Q2. (30 marks)

Consider the given similarity matrix. You are asked to perform group average hierarchical clustering on this dataset.

You need to show the steps and final result of the clustering algorithm. You will show the final results by drawing a dendrogram. The dendrogram should clearly show the order in which the points are merged.

	p₁	p₂	p₃	p₄	p₅
p₁	1.00	0.10	0.41	0.55	0.35
p₂	0.10	1.00	0.64	0.47	0.98
p₃	0.41	0.64	1.00	0.44	0.85
p₄	0.55	0.47	0.44	1.00	0.76
p₅	0.35	0.98	0.85	0.76	1.00

Q3. (30 marks)

Algorithm 1: k-means(D, k)

Data: D is a dataset of n d-dimensional points; k is the number of clusters.

1 Initialize k centers C = [c₁,c₂,,c_k];

Consider the (slightly incomplete) k-means clustering algorithm as depicted in Algorithm 1.

(1) Assume that the stopping criterion is till the algorithm converges to the final k clusters. Can you insert several lines of pseudo-code to the algorithm to implement this logic? You are not allowed to change the first 7 lines though. (2) The cost of k clusters is just the total cost of each group g_i, or formally

cost(g₁,g₂,,g_k) = ^Xcost(g_i)

i=1

cost(g_i) is the sum of squared distances of all its constituent points to the center c_i, or

cost(g_i) = ^Xdist²(p,c_i)

pg_i

dist() is the Euclidean distance. Now show that the cost of k clusters as evaluated at the end of each iteration (i.e., after Line 9 in the current algorithm) never increases. (You may assume d = 2)

(3) Prove that the cost of clusters obtained by k-means algorithm always converges to a local minima. You can make use of the previous conclusion even if you have not proved it.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] COMP9318 Assignment 1

Reviews