apache-spark tutorials and guides

Spark Streaming: NullPointerException inside foreachPartition

Oct 21, 2025

scala apache-spark spark-streaming

Is there a way to perform a cast or withColumn dataframe operation in PySpark without breaking a function chain?

Oct 21, 2025

python apache-spark pyspark apache-spark-sql

spark-submit yarn-cluster with --jars does not work?

Oct 21, 2025

java hadoop apache-spark hadoop-yarn cloudera-cdh

conditional aggregation using pyspark

Oct 21, 2025

python apache-spark pyspark apache-spark-sql

Spark ML gradient boosted trees not using all nodes

Oct 20, 2025

python apache-spark pyspark apache-spark-ml

PySpark to_json loses column name of struct inside array

Oct 18, 2025

python dataframe apache-spark pyspark apache-spark-sql

How to do a recursive self-join in Foundry Contour?

Oct 21, 2025

apache-spark pyspark apache-spark-sql palantir-foundry foundry-contour

structured streaming writing to multiple streams

Oct 19, 2025

apache-spark spark-structured-streaming azure-databricks

Expand column with array of structs into new columns

Oct 21, 2025

apache-spark pyspark

Why does spark-submit ignore the package that I include as part of the configuration of my spark session?

Oct 19, 2025

apache-spark pyspark apache-spark-sql

Pyspark partition data by a column and write parquet

Oct 21, 2025

dataframe apache-spark pyspark

Save DataFrame to Table - performance in Pyspark

Oct 19, 2025

apache-spark pyspark hive

Error "Invalid call to qualifier on unresolved object" when trying to write a Spark DF into a Hive table

Oct 20, 2025

scala apache-spark hive apache-spark-sql orc

How Do I Enable Fair Scheduler in PySpark?

Oct 21, 2025

java apache-spark pyspark

Disable Ivy Logging when using Spark-submit

Oct 21, 2025

apache-spark pyspark

What is shufflequerystage in spark DAG?

Oct 20, 2025

apache-spark pyspark apache-spark-sql spark-ui

Pyspark: Calculate streak of consecutive observations

Oct 19, 2025

apache-spark pyspark apache-spark-sql

OR condition in dataframe full outer join reducing performance spark/scala

Oct 20, 2025

scala apache-spark join apache-spark-sql

LDA cross validation evaluator

Oct 18, 2025

scala apache-spark apache-spark-mllib apache-spark-ml

how to use list comprehension variable names in Pyspark dataframes

Oct 19, 2025

python apache-spark pyspark

New posts in apache-spark