apache-spark tutorials and guides

Spark: Why does Python significantly outperform Scala in my use case?

Oct 11, 2022

How to find the most recent partition in HIVE table

Oct 09, 2022

hadoop apache-spark hive

Extracting `Seq[(String,String,String)]` from spark DataFrame

Apr 04, 2021

scala apache-spark dataframe apache-spark-sql

Spark without Hadoop: Failed to Launch

Sep 02, 2022

hadoop apache-spark hive

converting pandas dataframes to spark dataframe in zeppelin

Sep 15, 2022

pandas apache-spark dataframe apache-zeppelin

Getting NullPointerException when running Spark Code in Zeppelin 0.7.1

Nov 10, 2022

apache-spark apache-zeppelin

Creating Spark dataframe from numpy matrix

Jul 19, 2018

numpy apache-spark pyspark apache-spark-sql apache-spark-mllib

Why does Spark Planner prefer sort merge join over shuffled hash join?

Sep 13, 2022

apache-spark join apache-spark-sql

Kafka topic partitions to Spark streaming

Nov 22, 2020

apache-spark apache-kafka spark-streaming

java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ while running TwitterPopularTags

Jan 10, 2019

scala maven apache-spark noclassdeffounderror spark-streaming

Why does Spark job fail with "Exit code: 52"

Apr 01, 2022

apache-spark hadoop-yarn spark-dataframe

How to explode columns?

Oct 23, 2022

apache-spark dataframe spark-dataframe

Spark SQL SaveMode.Overwrite, getting java.io.FileNotFoundException and requiring 'REFRESH TABLE tableName'

Sep 01, 2018

apache-spark apache-spark-sql

How to get word details from TF Vector RDD in Spark ML Lib?

Oct 02, 2017

apache-spark apache-spark-mllib tf-idf apache-spark-ml

Cleaning up Spark history logs

Oct 25, 2022

apache-spark

Partitioning by multiple columns in PySpark with columns in a list

Sep 15, 2022

apache-spark pyspark window-functions

Sparksql filtering (selecting with where clause) with multiple conditions

Feb 11, 2019

python sql apache-spark apache-spark-sql pyspark

How to count a boolean in grouped Spark data frame

Aug 27, 2022

python sql apache-spark pyspark apache-spark-sql

Spark Dataframe validating column names for parquet writes

Aug 24, 2022

apache-spark pyspark apache-spark-sql spark-streaming parquet

How jobs are assigned to executors in Spark Streaming?

Oct 21, 2022

job-scheduling apache-spark executor

New posts in apache-spark