apache-spark tutorials and guides

Pyspark transform method that's equivalent to the Scala Dataset#transform method

Aug 23, 2022

How to query datasets in avro format?

Aug 22, 2022

apache-spark apache-spark-sql spark-avro

How to standardize ONE column in Spark using StandardScaler?

Sep 16, 2022

python apache-spark pyspark scale

What's the difference between Dataset.col() and functions.col() in Spark?

Nov 13, 2022

apache-spark apache-spark-sql

How to transpose/pivot the rows data to column in Spark Scala? [duplicate]

Jun 12, 2022

scala apache-spark apache-spark-sql pivot

Spark-sqlserver connection

Oct 30, 2022

sql-server apache-spark data-analysis

How to make sure my DataFrame frees its memory?

May 28, 2022

scala apache-spark garbage-collection spark-dataframe

exception in thread main java.lang.exceptionininitializerError When installing spark without hadoop

Sep 16, 2022

java apache-spark java-10

Join two DataFrames where the join key is different and only select some columns

Sep 07, 2022

apache-spark join pyspark spark-dataframe pyspark-sql

How to set environment variable in databricks?

Sep 15, 2022

apache-spark environment-variables databricks

spark: How does salting work in dealing with skewed data

Oct 27, 2022

apache-spark join group-by apache-spark-sql skew

What is ExternalRDDScan in the DAG?

Nov 19, 2022

apache-spark directed-acyclic-graphs internals

What is the difference between "predicate pushdown" and "projection pushdown"?

Aug 17, 2022

apache-spark bigdata parquet

How to calculate size of dataframe in spark scala

Jun 23, 2022

apache-spark apache-spark-sql spark-streaming

AttributeError: 'DataFrame' object has no attribute '_data'

Sep 14, 2022

python apache-spark pyspark databricks azure-databricks

Efficient boolean reductions `any`, `all` for PySpark RDD?

Oct 30, 2022

apache-spark

Trying to run SparkSQL over Spark Streaming

Nov 05, 2022

sql apache-spark spark-streaming

How to get the product of two RDDs?

Nov 09, 2022

scala apache-spark

compute string length in Spark SQL DSL

Oct 21, 2022

apache-spark apache-spark-sql string-length

How to show the scheme (including type) of a parquet file from command line or spark shell?

Mar 29, 2022

scala apache-spark parquet

New posts in apache-spark