apache-spark tutorials and guides

Are built-in Spark transformations faster than Spark SQL queries?

Oct 30, 2025

Nested Json extract the value with unknown key in the middle

Oct 30, 2025

json scala apache-spark apache-spark-sql scala-collections

Sparklyr/Dplyr - How to apply a user defined function for each row of a sparkdata frame and create write the output of each row to new column?

Oct 30, 2025

r apache-spark dplyr apache-spark-sql sparklyr

How do I connect to a Kerberos-secured Kafka cluster with Spark Structured Streaming?

Oct 30, 2025

scala apache-spark apache-kafka kerberos

How to select an exact number of random rows from DataFrame

Oct 30, 2025

apache-spark random apache-spark-sql

Pandas-on-spark throwing java.lang.StackOverFlowError

Oct 30, 2025

python pandas apache-spark pyspark pyspark-pandas

Spark ML: Taking square root of feature columns

Oct 28, 2025

apache-spark pyspark apache-spark-mllib apache-spark-ml

how to write Spark data frame to Neo4j database

Oct 30, 2025

apache-spark neo4j apache-spark-sql

Unable to overwrite default value of "spark.sql.shuffle.partitions" with Spark Structured Streaming

Oct 30, 2025

scala apache-spark spark-structured-streaming

Delta table statistics

Oct 29, 2025

apache-spark logging pyspark statistics delta-lake

Spark Streaming with mapGroupsWithState

Oct 29, 2025

scala apache-spark databricks spark-structured-streaming

stop hive's RetryingHMSHandler logging to databricks cluster

Oct 29, 2025

apache-spark log4j slf4j azure-databricks

Spark write data by SaveMode as Append or overwrite

Oct 30, 2025

scala apache-spark apache-spark-sql

Explanation of fold method of spark RDD

Oct 29, 2025

scala apache-spark rdd

spark-submit --packages is not working on my cluster what could be the reason?

Oct 29, 2025

scala maven apache-spark

Is spark overwrite save mode atomic?

Oct 29, 2025

apache-spark

Load to BigQuery Via Spark Job Fails with an Exception for Multiple sources found for parquet

Oct 30, 2025

scala apache-spark google-bigquery google-cloud-dataproc

How to monitor Spark job with Airflow

Oct 28, 2025

apache-spark airflow

New posts in apache-spark