apache-spark tutorials and guides

How to use JDBC source to write and read data in (Py)Spark?

Aug 30, 2022

Cannot find col function in pyspark

Aug 18, 2022

python apache-spark pyspark apache-spark-sql pyspark-sql

pyspark dataframe filter or include based on list

Aug 18, 2022

apache-spark filter pyspark apache-spark-sql

how to filter out a null value from spark dataframe

Aug 24, 2022

scala apache-spark apache-spark-sql spark-dataframe

How to find median and quantiles using Spark

Aug 18, 2022

python apache-spark median rdd pyspark

Pyspark: Split multiple array columns into rows

Nov 09, 2022

python apache-spark dataframe pyspark apache-spark-sql

What is the relationship between workers, worker instances, and executors?

Aug 18, 2022

apache-spark apache-spark-standalone

Is it possible to get the current spark context settings in PySpark?

Aug 17, 2022

apache-spark config pyspark

How to pivot Spark DataFrame?

Aug 17, 2022

scala apache-spark dataframe apache-spark-sql pivot

how to make saveAsTextFile NOT split output into multiple file?

Aug 17, 2022

scala apache-spark

How to prevent java.lang.OutOfMemoryError: PermGen space at Scala compilation?

Aug 17, 2022

scala apache-spark memory-management sbt scalatra-sbt

Pyspark: Exception: Java gateway process exited before sending the driver its port number

Mar 09, 2022

java python macos apache-spark pyspark

How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

Aug 26, 2022

apache-spark pyspark apache-spark-sql

Spark difference between reduceByKey vs. groupByKey vs. aggregateByKey vs. combineByKey

Feb 20, 2022

apache-spark grouping reducing

Which cluster type should I choose for Spark?

Oct 29, 2022

apache-spark hadoop-yarn mesos apache-spark-standalone

How does HashPartitioner work?

Aug 17, 2022

scala apache-spark rdd partitioning

How to link PyCharm with PySpark?

Aug 17, 2022

python apache-spark pyspark pycharm homebrew

How to pass -D parameter or environment variable to Spark job?

Sep 17, 2022

scala apache-spark

Removing duplicates from rows based on specific columns in an RDD/Spark DataFrame

Aug 17, 2022

apache-spark apache-spark-sql pyspark

How to write unit tests in Spark 2.0+?

Aug 17, 2022

scala unit-testing apache-spark junit apache-spark-sql

New posts in apache-spark