Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Apache Spark Scala code to Python

Can anyone convert this very simple scala code to python?

val words = Array("one", "two", "two", "three", "three", "three")
val wordPairsRDD = sc.parallelize(words).map(word => (word, 1))

val wordCountsWithGroup = wordPairsRDD
    .groupByKey()
    .map(t => (t._1, t._2.sum))
    .collect()
like image 777
muktadiur Avatar asked Mar 02 '26 14:03

muktadiur


2 Answers

try this:

words = ["one", "two", "two", "three", "three", "three"]
wordPairsRDD = sc.parallelize(words).map(lambda word : (word, 1))

wordCountsWithGroup = wordPairsRDD
    .groupByKey()
    .map(lambda t: (t[0], sum(t[1])))
    .collect()
like image 52
Soroosh Sarabadani Avatar answered Mar 04 '26 04:03

Soroosh Sarabadani


Two translate in python :

from operator import add
wordsList = ["one", "two", "two", "three", "three", "three"]
words = sc.parallelize(wordsList ).map(lambda l :(l,1)).reduceByKey(add).collect()
print words
words = sc.parallelize(wordsList ).map(lambda l : (l,1)).groupByKey().map(lambda t: (t[0], sum(t[1]))).collect()
print words
like image 21
Junayy Avatar answered Mar 04 '26 03:03

Junayy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!