apache-spark tutorials and guides

How do I download a large list of URLs in parallel in pyspark?

Nov 01, 2025

Rename written CSV file Spark

Oct 31, 2025

apache-spark amazon-s3 apache-spark-sql

How to merge list of list into single list in pyspark

Nov 01, 2025

apache-spark dataframe pyspark

How to extract tables with data from .sql dumps using Spark?

Nov 01, 2025

mysql scala apache-spark

drop column in a table/view using spark sql only

Oct 31, 2025

apache-spark apache-spark-sql string-interpolation

Why are there two options to read a CSV file in PySpark? Which one should I use?

Oct 31, 2025

python apache-spark pyspark apache-spark-2.0

How to create a co-occurrence matrix from a Spark RDD

Nov 01, 2025

scala apache-spark

How many concurrent tasks in one executor and how Spark handles multithreading among tasks in one executor?

Nov 01, 2025

java multithreading apache-spark concurrency hadoop-yarn

IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment

Oct 31, 2025

apache-spark pyspark google-bigquery databricks databricks-connect

java.lang.NoClassDefFoundError: jakarta/servlet/SingleThreadModel - Error while using apache spark 4.0-preview1

Nov 01, 2025

java spring-boot apache-spark apache-spark-sql

PySpark Mapping Elements in Array within a Dataframe to another Dataframe

Oct 31, 2025

python dataframe apache-spark pyspark

SparkSession does not pull down packages from repo in pytest suite

Oct 31, 2025

apache-spark pyspark pytest

StringType issue: Exception in thread "main" scala.MatchError: org.apache.spark.sql.types.StringType@

Nov 01, 2025

java scala apache-spark

Not able to retain the corrupted rows in pyspark using PERMISSIVE mode

Oct 31, 2025

python csv apache-spark pyspark

Spark Join of 2 dataframes which have 2 different column names in list

Oct 31, 2025

scala apache-spark join

Understanding lambda function inputs in Spark for RDDs

Oct 31, 2025

python apache-spark lambda pyspark

New posts in apache-spark