Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reduceByKey method not being found in Scala Spark

Tags:

Attempting to run http://spark.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala from source.

This line:

val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) 

is throwing error

value reduceByKey is not a member of org.apache.spark.rdd.RDD[(String, Int)]   val wordCounts = logData.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) 

logData.flatMap(line => line.split(" ")).map(word => (word, 1)) returns a MappedRDD but I cannot find this type in http://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.rdd.RDD

I'm running this code from Spark source so could be a classpath problem ? But required dependencies are on my classpath.

like image 294
blue-sky Avatar asked May 29 '14 22:05

blue-sky


People also ask

How many RDDs can cogroup() can work at once?

cogroup() can be used for much more than just implementing joins. We can also use it to implement intersect by key. Additionally, cogroup() can work on three or more RDDs at once.

Is reduceByKey a transformation or action?

In Spark, the reduceByKey function is a frequently used transformation operation that performs aggregation of data. It receives key-value pairs (K, V) as an input, aggregates the values based on the key and generates a dataset of (K, V) pairs as an output.

What is reduceByKey?

Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce.

How to initialize Spark?

Initializing Spark To create a SparkContext you first need to build a SparkConf object that contains information about your application. Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.


1 Answers

You should import the implicit conversions from SparkContext:

import org.apache.spark.SparkContext._ 

They use the 'pimp up my library' pattern to add methods to RDD's of specific types. If curious, see SparkContext:1296

like image 79
maasg Avatar answered Sep 17 '22 14:09

maasg