apache-spark tutorials and guides

Setting up dynamic allocation in Apache Spark?

Oct 25, 2022

apache-spark hadoop-yarn

Spark Local Mode - all jobs only use one CPU core

Oct 27, 2022

java amazon-web-services apache-spark amazon-ec2

spark - join one to many relationship dataframes

Nov 07, 2022

apache-spark

Cannot change hive.exec.max.dynamic.partitions in Spark

Oct 22, 2022

apache-spark hive

How to automate StructType creation for passing RDD to DataFrame

Feb 10, 2022

scala apache-spark spark-dataframe rdd

How to expose Spark Driver behind dockerized Apache Zeppelin?

May 17, 2022

apache-spark docker apache-zeppelin

Running from a local IDE against a remote Spark cluster

Oct 26, 2022

hadoop apache-spark hadoop-yarn kerberos cloudera-cdh

spark streaming assertion failed: Failed to get records for spark-executor-a-group a-topic 7 244723248 after polling for 4096

Mar 17, 2022

apache-spark apache-kafka spark-streaming

How Spark HashingTF works

Nov 07, 2022

apache-spark pyspark apache-spark-mllib tf-idf apache-spark-ml

Spark load settings from multiple configuration files

May 14, 2022

apache-spark

How to convert bytes from Kafka to their original object?

Nov 07, 2021

apache-spark apache-kafka spark-streaming spark-avro

Spark cosine distance between rows using Dataframe

Jan 18, 2022

apache-spark pyspark spark-dataframe cosine-similarity

PCA output in Spark doesn't matches with scikit-learn

Aug 24, 2019

python apache-spark pyspark pca apache-spark-ml

Using Spark Structured Streaming to Read Data From Kafka, Issue of Over-time is Always Occured

Apr 19, 2021

apache-spark apache-kafka spark-structured-streaming

Caching dataframes while keeping partitions

Nov 08, 2022

apache-spark

Can't pickle _thread.lock objects Pyspark send request to elasticseach

Jun 28, 2022

python apache-spark elasticsearch pyspark

AnalysisException: Queries with streaming sources must be executed with writeStream.start()

Jan 19, 2020

apache-spark spark-structured-streaming

Watermarking for Spark structured streaming with three way joins

May 30, 2022

scala apache-spark spark-structured-streaming

connecting mysql with pyspark

Apr 21, 2022

python mysql apache-spark pyspark

Spark Dataset when to use Except vs Left Anti Join

Nov 09, 2022

apache-spark apache-spark-sql anti-join

New posts in apache-spark