apache-spark tutorials and guides

Using Silhouette Clustering in Spark

Oct 06, 2022

Convert value depending on a type in SparkSQL via case matching of type

Oct 15, 2021

scala apache-spark

How to flatten nested lists in PySpark?

Jun 14, 2018

python apache-spark rdd

How to force Spark to evaluate DataFrame operations inline

Sep 05, 2022

apache-spark lazy-evaluation distributed-computing rdd spark-dataframe

Run Command on EMR Slaves?

Nov 12, 2022

apache-spark hadoop-yarn emr amazon-emr

How does Spark manage stages?

May 21, 2019

apache-spark

What row is used in dropDuplicates operator?

Oct 18, 2022

apache-spark pyspark apache-spark-sql

Create an empty array column of certain type in pyspark DataFrame

Nov 02, 2022

python dataframe apache-spark pyspark

Ignoring non-spark config property: hive.exec.dynamic.partition.mode

Jun 26, 2022

apache-spark spark-shell

How to CREATE TABLE USING delta with Spark 2.4.4?

May 02, 2022

apache-spark apache-spark-sql delta-lake

Write and read raw byte arrays in Spark - using Sequence File SequenceFile

Dec 04, 2019

scala hadoop hdfs apache-spark sequencefile

How to check if Spark RDD is in memory?

Oct 19, 2022

apache-spark rdd in-memory

Can Spark code be run on cluster without spark-submit?

Nov 05, 2022

apache-spark hadoop-yarn

How to save a spark RDD in gzip format through pyspark

Aug 10, 2019

python apache-spark pyspark

Parquet predicate pushdown

Sep 12, 2022

hadoop apache-spark parquet bigdata

How to map variable names to features after pipeline

Feb 02, 2022

scala apache-spark apache-spark-mllib apache-spark-ml

Find minimum for a timestamp through Spark groupBy dataframe

Apr 22, 2022

sql scala apache-spark apache-spark-sql

Config file to define JSON Schema Structure in PySpark

Nov 09, 2022

python apache-spark pyspark apache-spark-sql

Spark Context is not automatically created in Scala Spark Shell

Nov 10, 2022

apache-spark

Number of Executors in Spark Local Mode

Oct 24, 2022

scala apache-spark

New posts in apache-spark