Attempting to run http://spark.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala from source.
This line:
val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
is throwing error
value reduceByKey is not a member of org.apache.spark.rdd.RDD[(String, Int)] val wordCounts = logData.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
logData.flatMap(line => line.split(" ")).map(word => (word, 1))
returns a MappedRDD but I cannot find this type in http://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.rdd.RDD
I'm running this code from Spark source so could be a classpath problem ? But required dependencies are on my classpath.
cogroup() can be used for much more than just implementing joins. We can also use it to implement intersect by key. Additionally, cogroup() can work on three or more RDDs at once.
In Spark, the reduceByKey function is a frequently used transformation operation that performs aggregation of data. It receives key-value pairs (K, V) as an input, aggregates the values based on the key and generates a dataset of (K, V) pairs as an output.
Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce.
Initializing Spark To create a SparkContext you first need to build a SparkConf object that contains information about your application. Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.
You should import the implicit conversions from SparkContext
:
import org.apache.spark.SparkContext._
They use the 'pimp up my library' pattern to add methods to RDD's of specific types. If curious, see SparkContext:1296
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With