apache-spark tutorials and guides

How to get Kafka header's value to Spark Dataset as a single column?

Oct 17, 2025

When using Spark structured streaming , how to just get the aggregation result of current batch, like Spark Streaming?

Oct 17, 2025

apache-spark spark-streaming spark-structured-streaming

How to load a spark-nlp pre-trained model from disk

Oct 17, 2025

scala apache-spark nlp apache-spark-mllib johnsnowlabs-spark-nlp

Pyspark error with UDF: py4j.Py4JException: Method getnewargs([]) does not exist error

Oct 17, 2025

python apache-spark pyspark databricks

SparkJob on GCP dataproc failing with error - java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.<init>(ZIIIIIIZ)V

Oct 16, 2025

apache-spark google-cloud-platform google-cloud-dataproc

What happens if a Spark broadcast join is too large?

Oct 16, 2025

apache-spark

Pyspark 2.0 - IndextoString Error

Oct 16, 2025

apache-spark pyspark apache-spark-ml

How to row bind two Spark dataframes using sparklyr?

Oct 16, 2025

r apache-spark dplyr sparklyr

Read SAS sas7bdat data with Spark

Oct 14, 2025

apache-spark pyspark sas

Error when parsing html in Spark Dataframe

Oct 17, 2025

python apache-spark beautifulsoup pyspark

Understanding output of Word2Vec transform method

Oct 17, 2025

python apache-spark pyspark apache-spark-mllib apache-spark-ml

Adding JDBC driver to Spark on EMR

Oct 17, 2025

jdbc apache-spark amazon-emr

Iterate each row in a dataframe, store it in val and pass as parameter to Spark SQL query

Oct 16, 2025

scala dataframe apache-spark apache-spark-sql

Cannot connect to Cassandra in spark-shell

Oct 17, 2025

scala apache-spark cassandra spark-cassandra-connector

Spark Dataframe to Postgres using Copy Command -pyspark

Oct 17, 2025

postgresql apache-spark pyspark

Remove constant columns from an RDD and compute the covariance matrix

Oct 17, 2025

scala apache-spark covariance rdd

Error while I am using DataFrame show method in Pyspark

Oct 17, 2025

python apache-spark pyspark apache-spark-mllib

pyspark when/otherwise clause failure when using udf

Oct 17, 2025

python apache-spark pyspark apache-spark-sql user-defined-functions

Spark Scheduler vs Standalone Scheduler in the Spark Stack

Oct 17, 2025

apache-spark architecture

java.lang.NoSuchMethodError when reading an avro file using PySpark

Oct 16, 2025

apache-spark pyspark google-cloud-dataproc spark-avro

New posts in apache-spark