- Provide short answers (in one to two paragraphs) to the following questions [Part A, 20+40 = 60 marks]
Provide a PDF (or Word) file named q3.pdffor this question. Provide two plain text file Spark/Scala program files q3b-1.scalaand q3b-2.scala.
Programming Related:
- Compare and contrast an Apache Spark data set with a data frame. (5 marks)
- Compare and contrast reservoir sampling and bloom filter with the aid of illustrative Scala/Spark codes. (5+2*5 = 15 marks)
- List the main benefits of integrating HBase with Hadoop MapReduce Framework. (5 marks)
- Explain how Hadoop implements computational parallelism in terms of the parallel dwarf/s it employs and Flynns taxonomy (5 + 5 = 10 marks).
- Outline the main design features of the RDD abstraction for in-memory cluster computing? (15 marks)
Reviews
There are no reviews yet.