apache-spark tutorials and guides

Bundling Python3 packages for PySpark results in missing imports

Oct 17, 2022

Restarting Spark Structured Streaming Job consumes Millions of Kafka messages and dies

Sep 17, 2022

apache-spark pyspark spark-streaming spark-structured-streaming

Spark How to get number of Keys changed in two JSONS in Scala?

Mar 30, 2021

json scala apache-spark apache-spark-sql

Apache Spark: impact of repartitioning, sorting and caching on a join

Nov 04, 2022

apache-spark pyspark bigdata azure-databricks delta-lake

How to convert org.apache.spark.rdd.RDD[Array[Double]] to Array[Double] which is required by Spark MLlib

Apr 15, 2018

apache-spark apache-spark-mllib

Using Spark ML's OneHotEncoder on multiple columns

Oct 26, 2020

scala apache-spark apache-spark-ml

Spark performs slower with hardware scaling up

Jun 20, 2019

performance apache-spark

How does spark.python.worker.memory relate to spark.executor.memory?

Feb 24, 2022

memory apache-spark pyspark hadoop-yarn

How do I enable partition pruning in spark

Jun 26, 2019

apache-spark apache-spark-sql spark-dataframe pruning

How to read records from Kafka topic from beginning in Spark Streaming?

Aug 31, 2022

scala apache-spark apache-kafka spark-streaming

How to get execution DAG from spark web UI after job has finished running, when I am running spark on YARN?

Nov 03, 2022

apache-spark pyspark hadoop-yarn

How to save a file on the cluster

Aug 22, 2022

python apache-spark pyspark hdfs spark-submit

Is sample_n really a random sample when used with sparklyr?

Jan 31, 2022

r apache-spark random dplyr sparklyr

How to pre-package external libraries when using Spark on a Mesos cluster

Apr 19, 2022

scala apache-spark mesos mesosphere

Remove Empty Partitions from Spark RDD

Oct 17, 2022

hadoop apache-spark pyspark rdd

Spark 1.5.2 and SLF4J StaticLoggerBinder

Nov 27, 2021

java scala hadoop apache-spark sbt

Guava version while using spark-shell

Feb 12, 2019

apache-spark spark-cassandra-connector google-cloud-dataproc

Spark Shell - __spark_libs__.zip does not exist

Nov 11, 2022

hadoop apache-spark hadoop-yarn

Integrate key-value database with Spark

Feb 20, 2022

hadoop apache-spark rocksdb

What is spark.local.ip ,spark.driver.host,spark.driver.bindAddress and spark.driver.hostname?

Apr 11, 2022

apache-spark

New posts in apache-spark