apache-spark tutorials and guides

How do I get a SQL row_number equivalent for a Spark RDD?

Sep 06, 2022

Understanding spark physical plan

Sep 06, 2022

sql apache-spark query-optimization apache-spark-sql catalyst

AssertionError: col should be Column

Sep 06, 2022

python apache-spark pyspark apache-spark-sql

Encode and assemble multiple features in PySpark

Sep 05, 2022

python apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml

Condition in map function

Sep 06, 2022

scala apache-spark spark-streaming map-function

How to calculate sum and count in a single groupBy?

Sep 06, 2022

scala apache-spark apache-spark-sql

How to create a udf in PySpark which returns an array of strings?

Jan 16, 2022

python apache-spark pyspark apache-spark-sql user-defined-functions

Why does starting StreamingContext fail with “IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute”?

Jan 04, 2019

java apache-spark spark-streaming

Rolling your own reduceByKey in Spark Dataset

Sep 06, 2022

scala apache-spark mapreduce

In Apache Spark, why does RDD.union not preserve the partitioner?

Sep 06, 2022

apache-spark partitioning hadoop-partitioning

PySpark and broadcast join example

Sep 06, 2022

python apache-spark apache-spark-sql pyspark

Spark union column order

Sep 28, 2022

apache-spark pyspark apache-spark-sql pyspark-sql

How to find Spark's installation directory?

Sep 30, 2022

java ubuntu apache-spark

Join two ordinary RDDs with/without Spark SQL

Sep 05, 2022

scala join apache-spark rdd apache-spark-sql

Multiple condition filter on dataframe

Sep 05, 2022

python apache-spark dataframe pyspark apache-spark-sql

Left Anti join in Spark?

Sep 05, 2022

scala apache-spark

SQL query in Spark/scala Size exceeds Integer.MAX_VALUE

Jan 20, 2022

sql apache-spark amazon-ec2 emr

Why does Spark application fail with “ClassNotFoundException: Failed to find data source: kafka” as uber-jar with sbt assembly?

Oct 19, 2022

scala apache-spark sbt sbt-assembly spark-structured-streaming

Is it possible to alias columns programmatically in spark sql?

Sep 05, 2022

scala apache-spark apache-spark-sql

How to add any new library like spark-csv in Apache Spark prebuilt version

Sep 05, 2022

python apache-spark apache-spark-sql

New posts in apache-spark