apache-spark tutorials and guides

SparkContext parallelize invocation example in java

May 21, 2026

java apache-spark

Infinite loop of Resetting offset and seeking for LATEST offset

May 19, 2026

java apache-spark apache-kafka kafka-consumer-api

Optimizing Spark resources to avoid memory and space usage

May 21, 2026

apache-spark pyspark amazon-emr

Pyspark toPandas() Out of bounds nanosecond timestamp error

May 20, 2026

python pandas apache-spark pyspark apache-spark-sql

"Python was not found but can be installed" when using spark-submit on Windows

May 21, 2026

python apache-spark pyspark

Setup Apache Sedona on EMR

May 20, 2026

amazon-web-services apache-spark jar amazon-emr

spark scala get uncommon map elements

May 21, 2026

scala machine-learning apache-spark

AWS EKS Spark 3.0, Hadoop 3.2 Error - NoClassDefFoundError: com/amazonaws/services/s3/model/MultiObjectDeleteException

May 21, 2026

apache-spark hadoop amazon-s3 amazon-eks

Spark: Faster way to join two dataframe?

May 19, 2026

scala apache-spark

How can I change HDFS replication factor for my Spark program?

May 20, 2026

scala hadoop apache-spark hdfs

Spark is telling me that the features column is wrong

May 19, 2026

java apache-spark apache-spark-mllib apache-spark-ml

DataType.fromJSON() - java.lang.IllegalArgumentException: Failed to convert the JSON string <> to a data type

May 20, 2026

scala apache-spark

Spark mapPartitionsWithIndex : Identify a partition

May 21, 2026

scala apache-spark rdd hadoop-partitioning

Spark SQL on Cassandra table that is populated with Spark Streaming

May 20, 2026

apache-spark cassandra apache-spark-sql spark-streaming

Sparklyr connection to S3 bucket throwing up error

May 20, 2026

r apache-spark amazon-s3 sparklyr

How many consumers are created to read records per direct stream?

May 19, 2026

apache-spark apache-kafka spark-streaming

Check if values of column pyspark df exist in other column pyspark df

May 19, 2026

python dataframe apache-spark pyspark apache-spark-sql

pySpark .join() with different column names and can't be hard coded before runtime

May 20, 2026

apache-spark pyspark apache-spark-sql

Setting data lake connection in cluster Spark Config for Azure Databricks

May 20, 2026

apache-spark azure-databricks azure-data-lake-gen2

How do I handle errors in mapped functions in AWS Glue?

May 19, 2026

apache-spark pyspark aws-glue

New posts in apache-spark