apache-spark tutorials and guides

How to efficiently remove duplicate rows in Spark Dataframe, keeping row with highest timestamp

Feb 09, 2026

sql scala apache-spark

Merging RDDs using Scala Apache Spark

Feb 09, 2026

java scala apache-spark

Server side filtering of spark-cassandra on PySpark

Feb 09, 2026

python apache-spark cassandra pyspark apache-spark-sql

How to rename fields in an DataFrame corresponding to nested JSON

Feb 08, 2026

apache-spark apache-spark-sql

Merge Rows in Apache spark by eliminating null values

Feb 08, 2026

python scala apache-spark pyspark apache-spark-sql

How to read checkpointed RDD

Feb 09, 2026

scala apache-spark

Why is Spark creating multiple jobs for one action?

Feb 08, 2026

python apache-spark pyspark databricks

SparkSQL errors when using SQL DATE function

Feb 07, 2026

sql scala apache-spark apache-spark-sql

Elasticsearch support for spark 2.4.2 with scala 2.12

Feb 09, 2026

apache-spark elasticsearch spark-structured-streaming

How does spark.csv determine the number of partitions on read?

Feb 09, 2026

apache-spark

Cross-Version Conflicts with Spark and Azure-Cosmosdb

Feb 08, 2026

scala azure apache-spark sbt azure-cosmosdb

Printing ClusterID and its elements using Spark KMeans algo.

Feb 08, 2026

apache-spark k-means apache-spark-mllib

Spark Structured Streaming - Empty dictionary on new batch

Feb 08, 2026

python apache-spark dictionary pyspark spark-structured-streaming

How can I iterate Spark's DataFrame rows?

Feb 07, 2026

scala apache-spark dataframe iterator

Can't run LDA on Dataset[(scala.Long, org.apache.spark.mllib.linalg.Vector)] in Spark 2.0

Feb 08, 2026

scala apache-spark apache-spark-mllib

Pass List[String] or Seq[String] to groupBy in spark [duplicate]

Feb 08, 2026

scala apache-spark apache-spark-sql

New posts in apache-spark