apache-spark tutorials and guides

Getting error like need struct type but got string in spark scala for simple struct type

Sep 13, 2025

scala apache-spark apache-spark-sql

Pyspark how to add row number in dataframe without changing the order?

Sep 14, 2025

python dataframe apache-spark pyspark apache-spark-sql

PySpark cannot infer timestamp even with timestampFormat

Sep 13, 2025

apache-spark pyspark date-formatting

How to add partitioning to existing Iceberg table

Sep 13, 2025

scala apache-spark apache-spark-sql apache-iceberg

Configure EMR Cluster for Fair Scheduling

Sep 13, 2025

hadoop apache-spark emr amazon-emr

Collect only not null columns of each row to an array

Sep 13, 2025

apache-spark apache-spark-sql

Read data from Kafka and print to console with Spark Structured Sreaming in Python

Sep 13, 2025

apache-spark pyspark apache-kafka apache-spark-sql spark-structured-streaming

Spark pivot invokes Job even though pivot is not an Action

Sep 13, 2025

apache-spark apache-spark-sql

which is faster spark.sql or df.filter("").select("") . using scala

Sep 13, 2025

scala apache-spark apache-spark-sql

No applicable constructor/method found for zero actual parameters - Apache Spark Java

Sep 13, 2025

java apache-spark apache-spark-sql apache-spark-dataset

How to avoid empty files while writing parquet files?

Sep 13, 2025

apache-spark pyspark spark-structured-streaming

Shutdown spark structured streaming gracefully

Sep 13, 2025

apache-spark apache-spark-sql spark-streaming spark-structured-streaming

Spark agg to collect a single list for multiple columns

Sep 12, 2025

scala apache-spark group-by apache-spark-sql

TypeError converting a Pandas Dataframe to Spark Dataframe in Pyspark

Sep 12, 2025

python pandas apache-spark pyspark

pyspark map type contains duplicate keys

Sep 13, 2025

python apache-spark pyspark apache-spark-sql

spark apply function to columns in parallel

Sep 12, 2025

scala apache-spark parallel-processing apache-spark-sql

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext

Sep 11, 2025

python apache-spark tensorflow pyspark jupyter-notebook

Apache Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

Sep 11, 2025

hadoop memory memory-management apache-spark out-of-memory

Cost of an Azure Databricks cluster running but not executing any Spark app [closed]

Sep 09, 2025

azure apache-spark azure-databricks

Dataproc doesn't import Python module stored in Google Cloud Storage bucket

Sep 10, 2025

python apache-spark pyspark python-import google-cloud-dataproc

New posts in apache-spark