apache-spark tutorials and guides

Spark Scala: retrieve the schema and store it

Mar 18, 2022

How to write a DataFrame schema to file in Scala

Oct 21, 2022

scala apache-spark dataframe apache-spark-sql

How to Create a Database in Spark SQL

May 30, 2022

apache-spark apache-spark-sql

Invalidate metadata/refresh imapala from spark code

Sep 11, 2022

hadoop apache-spark impala

Understanding Representation of Vector Column in Spark SQL

Nov 05, 2022

apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml

How to Read Data from DB in Spark in parallel

Jun 02, 2022

apache-spark jdbc apache-spark-sql

How to do aggregation on multiple columns at once in Spark

Sep 05, 2022

scala apache-spark

spark jdbc df limit... what is it doing?

Sep 30, 2021

apache-spark apache-spark-sql

How to get max length of string column from dataframe using scala?

Sep 06, 2022

scala apache-spark apache-spark-sql max

Custom partitioner in SPARK (pyspark)

May 09, 2022

apache-spark pyspark

Check if arraytype column contains null

May 20, 2022

scala apache-spark dataframe null apache-spark-sql

PySpark, top for DataFrame

Sep 05, 2022

apache-spark dataframe pyspark spark-dataframe

Writing Spark dataframe as parquet to S3 without creating a _temporary folder

May 16, 2022

hadoop apache-spark amazon-s3 pyspark

How to export data from Cassandra to BigQuery

Jun 01, 2022

apache-spark cassandra pyspark google-bigquery google-cloud-platform

How to get date from different year, month and day columns in spark (scala)

Nov 13, 2022

dataframe scala apache-spark date apache-spark-sql

How to wait until all executors are allocated before Spark application starts on YARN?

May 07, 2022

apache-spark hadoop-yarn amazon-emr

Build Spark SQL query dynamically

Oct 14, 2022

scala apache-spark apache-spark-sql

Why does Spark on YARN in cluster mode fail with "Exception in thread "Driver" java.lang.NullPointerException"?

Jan 01, 2021

apache-spark nullpointerexception emr

PySpark: create dataframe from random uniform disribution

May 04, 2022

python apache-spark pyspark

How to force a certain partitioning in a PySpark DataFrame?

Oct 03, 2021

apache-spark pyspark partitioning

New posts in apache-spark