rdd tutorials and guides

Treat Spark RDD like plain Seq

Nov 01, 2022

How to add columns of 2 RDDs to from a single RDD and then do aggregation of rows based on date data in PySpark

Nov 02, 2022

python apache-spark aggregate pyspark rdd

Spark Mlib FPGrowth job fails with Memory Error

Nov 01, 2022

apache-spark rdd apache-spark-mllib

Counting distinct texts in a Spark RDD with array objects

Oct 31, 2022

python apache-spark pyspark rdd

Concurrent transformations on RDD in foreachDD function of Spark DStream

Nov 01, 2022

java apache-spark spark-streaming rdd dstream

Misunderstanding of spark RDD fault tolerant

Oct 30, 2022

apache-spark spark-streaming rdd distributed-computing fault-tolerance

what is the difference between rdd.repartition() and partition size in sc.parallelize(data, partitions)

Oct 21, 2022

python apache-spark rdd

pyspark: "too many values" error after repartitioning

Oct 21, 2022

python apache-spark apache-spark-sql pyspark rdd

Fail to apply mapping on an RDD on multipe spark nodes through Elasticsearch-hadoop library

Oct 20, 2022

scala elasticsearch apache-spark rdd elasticsearch-hadoop

No Java class corresponding to Product with Serializable with Base found

Oct 20, 2022

java scala apache-spark rdd apache-spark-dataset

Why would one use DataFrame.select over DataFrame.rdd.map (or vice versa)?

Oct 20, 2022

performance apache-spark dataframe apache-spark-sql rdd

Creating a custom Spark RDD in Python

Sep 28, 2022

python apache-spark pyspark rdd

Caching factor of MatrixFactorizationModel in PySpark

Sep 29, 2022

apache-spark pyspark rdd apache-spark-mllib

Convert JSON objects to RDD

Sep 29, 2022

json scala apache-spark rdd

When create two different Spark Pair RDD with same key set, will Spark distribute partition with same key to the same machine?

Sep 28, 2022

scala join apache-spark rdd

Are recursive computations with Apache Spark RDD possible?

Aug 26, 2022

scala apache-spark recursion rdd chess

What operations of spark is processed in parallel?

Aug 29, 2022

apache-spark spark-streaming rdd

Spark RDD's - how do they work

Sep 10, 2022

scala apache-spark bigdata distributed-computing rdd

Does a join of co-partitioned RDDs cause a shuffle in Apache Spark?

Sep 07, 2022

apache-spark spark-streaming rdd

How to extract an element from a array in pyspark

Sep 09, 2022

python apache-spark pyspark rdd

New posts in rdd