apache-spark tutorials and guides

Apache Spark how to append new column from list/array to Spark dataframe

Jun 14, 2022

Pyspark: Is there an equivalent method to pandas info()?

Jan 02, 2021

python pandas apache-spark pyspark

Getting last value of group in Spark

Nov 10, 2018

apache-spark pyspark spark-dataframe sparkr

How to read streaming data in XML format from Kafka?

Aug 24, 2022

apache-spark xml-parsing pyspark-sql spark-structured-streaming

How to flatten columns of type array of structs (as returned by Spark ML API)?

Aug 10, 2022

apache-spark apache-spark-sql apache-spark-ml

Splitting a column in pyspark

Nov 20, 2022

python apache-spark pyspark

Spark: Return empty column if column does not exist in dataframe

Nov 06, 2022

apache-spark pyspark apache-spark-sql pyspark-sql

Apache Spark startsWith in SQL expression

Sep 07, 2022

scala apache-spark apache-spark-sql

Spark AnalysisException when "flattening" DataFrame in Spark SQL

Aug 25, 2022

apache-spark apache-spark-sql

Pyspark - Cumulative sum with reset condition

Jun 24, 2022

python dataframe apache-spark pyspark cumulative-sum

How to find the max value of multiple columns?

Nov 07, 2022

scala apache-spark apache-spark-sql

How to set up Zeppelin to work with remote EMR Yarn cluster

Aug 29, 2022

apache-spark hadoop-yarn emr apache-zeppelin

Spark Convert Data Frame Column to dense Vector for StandardScaler() "Column must be of type org.apache.spark.ml.linalg.VectorUDT"

Mar 09, 2022

python apache-spark pyspark apache-spark-sql apache-spark-ml

Java Apache Spark: Long transformation chains result in quadratic time

May 15, 2019

java apache-spark

Pyspark Dataframe Join using UDF

Feb 07, 2022

python apache-spark pyspark apache-spark-sql user-defined-functions

set spark.streaming.kafka.maxRatePerPartition for createDirectStream

Sep 16, 2022

apache-spark spark-streaming

pyspark 1.6.0 write to parquet gives "path exists" error

Oct 15, 2021

apache-spark pyspark

How to run a scala program in terminal?

May 23, 2022

scala shell apache-spark terminal

spark sql count(*) query store result

Nov 14, 2022

sql apache-spark apache-spark-sql

Spark Parquet Loader: Reduce number of jobs involved in listing a dataframe's files

Oct 15, 2022

apache-spark pyspark

New posts in apache-spark