apache-spark tutorials and guides

How does parquet determine which encoding to use?

Oct 16, 2022

Scala module requiring specific version of data bind for Spark

Oct 17, 2022

java scala apache-spark jackson-databind

how to load a word2vec model and call its function into the mapper

Apr 22, 2020

apache-spark pyspark apache-spark-mllib word2vec

Saving ordered dataframe in Spark

Oct 27, 2022

hadoop apache-spark dataframe

How to debug the function passed to mapPartitions

Mar 06, 2020

apache-spark mapreduce pyspark partitioning

Remove new line from CSV file

Sep 10, 2022

python csv apache-spark newline

Pyspark > Dataframe with multiple array columns into multiple rows with one value each

Oct 14, 2022

python dataframe apache-spark pyspark apache-spark-sql

Spark application throws javax.servlet.FilterRegistration

Aug 08, 2022

scala intellij-idea sbt apache-spark

How do I call a UDF on a Spark DataFrame using JAVA?

Feb 12, 2020

java apache-spark apache-spark-sql user-defined-functions

How to create a custom Estimator in PySpark

May 21, 2020

python apache-spark pyspark apache-spark-mllib apache-spark-ml

Are failed tasks resubmitted in Apache Spark?

Oct 02, 2022

apache-spark

Spark sql queries vs dataframe functions

Oct 21, 2022

sql performance apache-spark dataframe apache-spark-sql

Spark: long delay between jobs

Dec 12, 2018

scala hadoop apache-spark

SparkContext Error - File not found /tmp/spark-events does not exist

Oct 05, 2021

python amazon-web-services apache-spark amazon-ec2 pyspark

Comparing columns in Pyspark

Oct 02, 2022

python apache-spark pyspark

Why does vcore always equal the number of nodes in Spark on YARN?

Sep 15, 2022

apache-spark hadoop-yarn

Is Spark DataFrame nested structure limited for selection?

Sep 08, 2022

apache-spark apache-spark-sql

ValueError: Cannot run multiple SparkContexts at once in spark with pyspark

Aug 16, 2022

python-3.x apache-spark pyspark

Failed to bind to: spark-master, using a remote cluster with two workers

Jun 01, 2022

binding apache-spark runtime-error

Spark iteration time increasing exponentially when using join

Sep 28, 2021

python loops apache-spark iteration pyspark

New posts in apache-spark