apache-spark tutorials and guides

How to calculate the mean of each pair in an RDD consisting of (Key, [Value]) pairs in Spark?

Jul 12, 2022

scala apache-spark

How to create a VertexId in Apache Spark GraphX using a Long data type?

Nov 04, 2022

scala apache-spark spark-graphx

error when starting the spark shell

Aug 11, 2022

apache-spark

java.util.HashMap missing in PySpark session

Jan 27, 2018

python apache-spark pyspark py4j

Elasticsearch + Apache Spark performance

Jul 06, 2022

elasticsearch apache-spark apache-spark-sql

EMR PySpark: LZO Codec not found

Apr 10, 2020

apache-spark hdfs pyspark emr

Spark streaming + json4s-jackson dependency problems

Feb 28, 2021

maven apache-spark jackson spark-streaming json4s

In Apache-spark, how to add the sparse vector?

Nov 04, 2022

scala apache-spark scala-breeze

SparkSQL - Lag function?

May 24, 2019

sql apache-spark pyspark apache-spark-sql window-functions

How to config checkpoint to redeploy spark streaming application?

Jul 11, 2022

apache-spark bigdata spark-streaming

Spark + Kafka integration - mapping of Kafka partitions to RDD partitions

Oct 22, 2022

scala apache-spark apache-kafka spark-streaming apache-spark-1.4

Spark - Adding JDBC Driver JAR to Google Dataproc

Nov 17, 2022

apache-spark jdbc google-cloud-platform apache-spark-sql google-cloud-dataproc

Do parquet files preserve the row order of Spark DataFrames?

Nov 01, 2022

apache-spark apache-spark-sql parquet

Not enough space to cache rdd in memory warning

Oct 07, 2019

amazon-web-services amazon-s3 apache-spark rdd

How does the number of partitions affect `wholeTextFiles` and `textFiles`?

Jan 09, 2020

python apache-spark pyspark

Regrouping / Concatenating DataFrame rows in Spark

Nov 18, 2022

scala apache-spark dataframe apache-spark-sql apache-spark-ml

A quick guide on Salt-based install of Spark cluster

Feb 08, 2022

apache-spark hdfs salt-stack

What are the pros and cons of using broadcast variables in a singleton?

Nov 02, 2022

java apache-spark broadcast

Spark: why tasks assigned only to one worker?

Jul 22, 2022

apache-spark

Spark-HBASE Error java.lang.IllegalStateException: unread block data

Dec 21, 2021

apache-spark hbase apache-spark-sql

New posts in apache-spark