[SOLVED] 代写 C python scala Spark parallel Question 1 2 pts

30 $

File Name: 代写_C_python_scala_Spark_parallel_Question_1______________2 pts.zip
File Size: 631.14 KB

SKU: 9247466115 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Question 12 pts
The following piece of code computes the frequencies of the words in a text file:
from operator import add
lines = sc.textFile(‘README.md’)
counts = lines.flatMap(lambda x: x.split()) 
.map(lambda x: (x, 1)) 
.reduceByKey(add)
Add one line to find the most frequent word. Output this word and its frequency.
Hint: Use sortBy(), reduce(), or max()
Answer:
Using Python version 2.7.12 (default, Dec4 2017 14:50:18)
SparkSession available as ‘spark’.
>>> from operator import add
>>> lines = sc.textFile(‘README.md’)
>>> counts = lines.flatMap(lambda x: x.split()).map(lambda x: (x, 1)).reduceByKey(add)
>>> print(counts.map(lambda (k,v): (v,k)).sortByKey(False).take(1))
[(24, u’the’)]

Question 22 pts
Modify the word count example above, so that we only count the frequencies of those words consisting of 5 or more characters.
Answer:
Using Python version 2.7.12 (default, Dec4 2017 14:50:18)
SparkSession available as ‘spark’.
>>> from operator import add
>>> lines = sc.textFile(‘README.md’)
>>> counts = lines.flatMap(lambda x: x.split())
>>> y = counts.filter(lambda word:len(word)>=5)
>>> t = y.map(lambda x:(x, 1))
>>> p = t.reduceByKey(add)

Question 3 1 pts
Consider the following piece of code:
A = sc.parallelize(xrange(1, 100))
t = 50
B = A.filter(lambda x: x < t)
print B.count()
t = 10
C = B.filter(lambda x: x > t)
print C.count()
What’s its output? (Yes, you can just run it.)

Answer: Using Python version 2.7.12 (default, Dec4 2017 14:50:18)
SparkSession available as ‘spark’.
>>> A = sc.parallelize(xrange(1, 100))
>>> t=50
>>> B = A.filter(lambda x: x < t)>>> print B.count()
49 #output
>>> t=10
>>> C = B.filter(lambda x: x > t)
>>> print C.count()
0 #output

Question 42 pts
The intent of the code above is to get all numbers below 50 from A and put them into B, and then get all numbers above 10 from B and put them into C.  Fix the code so that it produces the desired behavior, by adding one line of code.  You are not allowed to change the existing code.
Answer:
Using Python version 2.7.12 (default, Dec4 2017 14:50:18)
SparkSession available as ‘spark’.
>>> A = sc.parallelize(xrange(1, 100))
>>> t=50
>>> B = A.filter(lambda x: x < t)>>> B.cache()
PythonRDD[1] at RDD at PythonRDD.scala:48
>>> print B.count()
49
>>> t=10
>>> C = B.filter(lambda x: x > t)
>>> print C.count()
39

Question 53 pts
Modify the PMI example by sending a_dict and n_dict inside the closure. Do not use broadcast variables.
Answer:

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 C python scala Spark parallel Question 1 2 pts
30 $