apache-spark tutorials and guides

Spark ML gradient boosted trees not using all nodes

Oct 20, 2025

PySpark to_json loses column name of struct inside array

Oct 18, 2025

python dataframe apache-spark pyspark apache-spark-sql

How to do a recursive self-join in Foundry Contour?

Oct 21, 2025

apache-spark pyspark apache-spark-sql palantir-foundry foundry-contour

structured streaming writing to multiple streams

Oct 19, 2025

apache-spark spark-structured-streaming azure-databricks

Expand column with array of structs into new columns

Oct 21, 2025

apache-spark pyspark

Why does spark-submit ignore the package that I include as part of the configuration of my spark session?

Oct 19, 2025

apache-spark pyspark apache-spark-sql

Pyspark partition data by a column and write parquet

Oct 21, 2025

dataframe apache-spark pyspark

Save DataFrame to Table - performance in Pyspark

Oct 19, 2025

apache-spark pyspark hive

Error "Invalid call to qualifier on unresolved object" when trying to write a Spark DF into a Hive table

Oct 20, 2025

scala apache-spark hive apache-spark-sql orc

How Do I Enable Fair Scheduler in PySpark?

Oct 21, 2025

java apache-spark pyspark

Disable Ivy Logging when using Spark-submit

Oct 21, 2025

apache-spark pyspark

What is shufflequerystage in spark DAG?

Oct 20, 2025

apache-spark pyspark apache-spark-sql spark-ui

Pyspark: Calculate streak of consecutive observations

Oct 19, 2025

apache-spark pyspark apache-spark-sql

OR condition in dataframe full outer join reducing performance spark/scala

Oct 20, 2025

scala apache-spark join apache-spark-sql

LDA cross validation evaluator

Oct 18, 2025

scala apache-spark apache-spark-mllib apache-spark-ml

how to use list comprehension variable names in Pyspark dataframes

Oct 19, 2025

python apache-spark pyspark

FileNotFoundException on _temporary/0 directory when saving Parquet files

Oct 20, 2025

python apache-spark hadoop hdfs azure-blob-storage

Spark Build Fails Because Of Avro Mapred Dependency

Oct 19, 2025

scala apache-spark

Databricks - pyspark.pandas.Dataframe.to_excel does not recognize abfss protocol

Oct 21, 2025

python pandas azure apache-spark azure-databricks

How to create managed hive table with specified location through Spark SQL?

Oct 20, 2025

apache-spark amazon-s3 hive apache-spark-sql

New posts in apache-spark