5/5 - (1 vote)

Overview

Name: [Solved] Project 3: MapReduce Bigrams CSE 291
Brand: Assignment Chef
SKU: [Solved] Project 3: MapReduce Bigrams CSE 291
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

This project is designed to give you a basic familiarity with the MapReduce paradigm via the Hadoop framework. We give you Dockerfiles to enable you to build and start a Hadoop cluster that is pre-loaded with the Hadoop word-count exampland text data files.

Your first task is to use the Docker console to start up a Hadoop cluster consisting of one Hadoop/HDFS master node and four(4) Hadoop/HDFS client nodes.

Your second task is to run the WordCount example, as is, without modification, and confirm that your cluster is working.

Your third task is to modify the WordCount example to create a BigramExample, which counts the number of bigrams within the corpus. At the conclusion of its exectution, it should output three pieces of information, one per line: (1 )the total number of bigrams, (2)i the most common bigram, and (3) the number of bigrams required to add up to 10% of all bigrams.

Collaboration

You may work with up to one other person on this project.

Counting Bigrams

For the purpose of this assignment, dont get fancy. We know it it the end of the quarter. Just use static variables within the mapper to keep track of the current word and the previous word. Output a count of one for each pair, and then combine and reduce from there. This gives you a sorted histogram of bigrams.

Generating the Output

Once you have the histogram of the bigrams, it is straight-forward to determinethe most frequent, or even top-N, bigrams. If you reduce the histogram of bigrams to a count of bigrams, it will bs straight-froward to determine how many of the top-N are needed to reach at least 10% of the total.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] Project 3: MapReduce Bigrams CSE 291

Overview

Counting Bigrams

Generating the Output

Reviews

Related products

[Solved] Project 3: Docker Cluster Setup CSE 291

[Solved] Project 2: Distributed Filesystem CSE 291

[SOLVED] Cse291 project 3- neural crfs for constituency parsing

[SOLVED] Cse291 project 2- neural conditional random fields for named entity recognition

[Solved] Project 1: Remote Method Invocation CSE 291