apache-spark tutorials and guides

java.lang.UnsupportedOperationException: Error in spark when writing

Oct 19, 2022

apache-spark apache-spark-dataset

How does Spark handle failure scenarios involving JDBC data source?

Oct 18, 2022

scala apache-spark jdbc apache-spark-sql

Spark using recursive case class

Oct 18, 2022

scala apache-spark apache-spark-sql apache-spark-dataset

How to integrate HIVE access into PySpark derived from pip and conda (not from a Spark distribution or package)

Oct 19, 2022

python apache-spark hive pyspark hive-metastore

Window Function Tie breaker on other field to get the Latest Record

Oct 18, 2022

sql apache-spark pyspark apache-spark-sql pyspark-sql

How to set optimal config values - trigger time, maxOffsetsPerTrigger - for Spark Structured Streaming while reading messages from Kafka?

Oct 17, 2022

apache-spark apache-kafka spark-streaming spark-structured-streaming

structured streaming Kafka 2.1->Zeppelin 0.8->Spark 2.4: spark does not use jar

Oct 18, 2022

python apache-spark pyspark apache-kafka apache-zeppelin

Cross account GCS access using Spark on Dataproc

Oct 18, 2022

apache-spark google-cloud-platform google-bigquery google-cloud-storage google-cloud-dataproc

How to overwrite a parquet file from where DataFrame is being read in Spark

Oct 18, 2022

python apache-spark metadata parquet

Spark: Dataframe Serialization

Jun 14, 2022

scala apache-spark serialization spark-dataframe kryo

How can PySpark be called in debug mode?

Sep 11, 2022

python python-2.7 hadoop intellij-idea apache-spark

spark streaming checkpoint recovery is very very slow

Sep 13, 2022

apache-spark amazon-s3 spark-streaming amazon-kinesis checkpointing

How to change case of whole column to lowercase?

Oct 03, 2022

java apache-spark apache-spark-sql apache-spark-dataset

Spark Standalone Mode: How to compress spark output written to HDFS

Feb 23, 2022

scala compression hdfs apache-spark

Error to start pre-built spark-master when slf4j is not installed

Oct 30, 2022

apache-spark

pyspark addPyFile to add zip of .py files, but module still not found

May 12, 2022

apache-spark pyspark

Spark Strutured Streaming automatically converts timestamp to local time

Nov 18, 2022

java scala apache-spark apache-spark-sql spark-structured-streaming

Spark : Read file only if the path exists

Apr 30, 2022

scala apache-spark parquet

Spark and Not Serializable DateTimeFormatter

Nov 04, 2022

java scala serialization apache-spark

Removing duplicate columns after a DF join in Spark

Oct 15, 2022

python pyspark apache-spark apache-spark-sql

New posts in apache-spark