apache-spark tutorials and guides

How to change column metadata in pyspark?

Aug 27, 2022

How to write rows asynchronously in Spark Streaming application to speed up batch execution?

Jun 27, 2022

performance apache-spark apache-spark-sql spark-streaming

spark-sql Table or view not found error

Jun 12, 2018

apache-spark apache-spark-sql spark-dataframe

How to join/merge a list of dataframes with common keys in PySpark?

Sep 28, 2022

python apache-spark pyspark apache-spark-sql

How to display a streaming DataFrame (as show fails with AnalysisException)?

Sep 12, 2022

apache-spark pyspark apache-kafka spark-structured-streaming

How to force repartitioning in a spark dataframe?

Oct 29, 2022

python apache-spark pyspark distributed

Eclipse remote debug spark-submit

Nov 01, 2022

apache-spark

How to create schema (StructType) with one or more StructTypes?

Jun 23, 2022

scala apache-spark apache-spark-sql

How to convert nested avro GenericRecord to Row

Oct 03, 2021

java apache-spark avro spark-avro

PySpark aggregation function for "any value"

Oct 24, 2022

python apache-spark pyspark apache-spark-sql coalesce

Saving empty DataFrame with known schema (Spark 2.2.1)

Oct 27, 2022

apache-spark parquet databricks

Why does array_contains accept columns for both arguments in SQL but not in Dataset API?

Mar 12, 2022

apache-spark apache-spark-sql

Spark Structured Streaming - Limitations? (Source Performance, Unsupported Operations, Spark UI)

Mar 09, 2022

apache-spark apache-kafka spark-structured-streaming

Incompatible Jackson version: Spark Structured Streaming

Jun 18, 2022

scala apache-spark sbt apache-spark-sql

Number of dataframe partitions after sorting?

Oct 25, 2022

apache-spark apache-spark-sql

Drop rows containing specific value in PySpark dataframe

Sep 21, 2022

apache-spark pyspark apache-spark-sql pyspark-sql

Does Spark distributes dataframe across nodes internally?

Nov 13, 2022

apache-spark pyspark apache-spark-sql

How to specify batch interval in Spark Structured Streaming?

Jul 17, 2022

apache-spark pyspark spark-structured-streaming

How to concatenate multiple columns in PySpark with a separator?

Sep 20, 2022

apache-spark pyspark apache-spark-sql

Spark Window aggregation vs. Group By/Join performance

Aug 22, 2022

apache-spark apache-spark-sql

New posts in apache-spark