apache-spark tutorials and guides

Aggregating multiple columns with custom function in Spark

Aug 30, 2022

Specifying the filename when saving a DataFrame as a CSV [duplicate]

Aug 30, 2022

scala csv apache-spark pyspark

Calling Java/Scala function from a task

Jul 29, 2017

python scala apache-spark pyspark apache-spark-mllib

Getting the count of records in a data frame quickly

Aug 30, 2022

scala apache-spark hadoop-streaming

pyspark: rolling average using timeseries data

Sep 12, 2022

apache-spark pyspark window-functions moving-average

Where do you need to use lit() in Pyspark SQL?

Mar 08, 2022

python apache-spark pyspark apache-spark-sql

Spark on yarn concept understanding

Aug 30, 2022

hadoop apache-spark hdfs hadoop-yarn

Is there better way to display entire Spark SQL DataFrame?

Aug 30, 2022

scala apache-spark apache-spark-sql

PySpark row-wise function composition

May 06, 2022

python apache-spark pyspark apache-spark-sql

SPARK SQL - case when then

Sep 24, 2022

sql apache-spark

How to conditionally replace value in a column based on evaluation of expression based on another column in Pyspark?

Aug 30, 2022

apache-spark pyspark apache-spark-sql pyspark-sql

Can I add arguments to python code when I submit spark job?

Aug 30, 2022

python apache-spark cluster-mode

PySpark create new column with mapping from a dict

Aug 30, 2022

python apache-spark dictionary pyspark apache-spark-sql

DataFrame join optimization - Broadcast Hash Join

Aug 30, 2022

apache-spark dataframe apache-spark-sql apache-spark-1.4

How to exclude multiple columns in Spark dataframe in Python

Aug 30, 2022

apache-spark dataframe pyspark apache-spark-sql

“value $ is not a member of StringContext” - Missing Scala plugin?

Mar 16, 2022

scala apache-spark

Understanding Spark's caching

Aug 29, 2022

apache-spark

Viewing the content of a Spark Dataframe Column

Aug 29, 2022

python apache-spark dataframe pyspark

Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)

Oct 05, 2022

apache-spark impala apache-drill

Schema evolution in parquet format

Aug 29, 2022

apache-spark hadoop data-warehouse avro parquet

New posts in apache-spark