apache-spark tutorials and guides

Spark streaming job doesn't delete shuffle files

Jan 03, 2026

apache-spark apache-kafka spark-streaming

Spark RDD: How to calculate statistics most efficiently?

Jan 03, 2026

apache-spark pyspark distributed-computing rdd apache-spark-mllib

Explode column with array of arrays - PySpark

Jan 03, 2026

python arrays apache-spark pyspark databricks

Caching DataFrame in Spark Thrift Server

Jan 03, 2026

apache-spark apache-spark-sql spark-thriftserver

Spark dense_rank window function - without a partitionBy clause

Jan 03, 2026

sql-server scala apache-spark hadoop-yarn hadoop2

How to delete documents(records) with Mongo-Hadoop connector for Spark

Jan 02, 2026

mongodb hadoop apache-spark apache-spark-sql mongodb-hadoop

Spark Streaming Kafka Stream batch execution

Jan 02, 2026

java apache-spark apache-kafka spark-streaming spark-streaming-kafka

Why does spark application fail with java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig even though the jar exists?

Jan 02, 2026

scala apache-spark pyspark

Zeppelin notebook execute not manual

Jan 02, 2026

apache-spark apache-spark-sql apache-zeppelin spark-submit

Scala-Spark flattening nested schema contains array

Jan 01, 2026

scala apache-spark apache-spark-sql

Unable to initialize main class org.apache.spark.deploy.SparkSubmit when trying to run pyspark

Jan 02, 2026

python apache-spark pyspark conda

Null check for Double/Int Value in Spark

Jan 02, 2026

scala hadoop apache-spark hive

How to divide a numerical columns in ranges and assign labels for each range in apache spark?

Jan 02, 2026

apache-spark dataframe pyspark apache-spark-sql hivecontext

Spark/Gradle -- Getting IP Address in build.gradle to use for starting master and workers

Jan 02, 2026

apache-spark gradle groovy build.gradle

How to specify the group id of kafka consumer for spark structured streaming?

Jan 02, 2026

apache-spark apache-spark-sql spark-streaming spark-streaming-kafka

get local time in pyspark dependent on a column

Jan 01, 2026

python datetime apache-spark pyspark apache-spark-sql

Playframework & Spark

Jan 01, 2026

playframework apache-spark

Cache not preventing multiple filescans?

Dec 31, 2025

apache-spark dataframe caching

New posts in apache-spark