apache-spark tutorials and guides

why is scala method serialisable while function not?

Nov 01, 2022

scala apache-spark

How to use correlation in Spark with Dataframes?

Oct 31, 2022

python apache-spark pyspark apache-spark-sql correlation

Is it possible to load word2vec pre-trained available vectors into spark?

Oct 31, 2022

scala apache-spark stanford-nlp word2vec

Spark with BloomFilter of billions of records causes Kryo serialization failed: Buffer overflow.

Oct 31, 2022

scala apache-spark bloom-filter bigdata

spark df.write quote all fields but not null values

Oct 31, 2022

csv apache-spark spark-dataframe

Misunderstanding of spark RDD fault tolerant

Oct 30, 2022

apache-spark spark-streaming rdd distributed-computing fault-tolerance

How to fix 'DataFrame' object has no attribute 'coalesce'?

Oct 31, 2022

python apache-spark dataframe pyspark apache-spark-sql

Spark: understanding partitioning - cores

Oct 31, 2022

multithreading scala apache-spark cpu-cores

Spark Streaming Exception: java.util.NoSuchElementException: None.get

Oct 31, 2022

apache-spark hadoop apache-kafka apache-spark-sql spark-streaming

Calling another custom Python function from Pyspark UDF

Oct 30, 2022

python apache-spark pyspark user-defined-functions

Structured Streaming output is not showing on Jupyter Notebook

Oct 29, 2022

apache-spark pyspark jupyter-notebook spark-streaming spark-structured-streaming

Spark structured streaming: converting row to json

Jul 19, 2022

java json scala apache-spark spark-structured-streaming

How to compose column name using another column's value for withColumn in Scala Spark

Sep 22, 2022

scala apache-spark apache-spark-sql

In pyspark, why does `limit` followed by `repartition` create exactly equal partition sizes?

Nov 22, 2020

python apache-spark pyspark

AWS EMR Spark Python Logging

Mar 01, 2022

python apache-spark emr

PySpark: Take average of a column after using filter function

Sep 16, 2022

python apache-spark pyspark apache-spark-sql

How to avoid shuffles while joining DataFrames on unique keys?

Oct 15, 2022

apache-spark apache-spark-sql

Apache Flink vs Apache Spark as platforms for large-scale machine learning?

Mar 25, 2021

machine-learning apache-spark apache-flink

New posts in apache-spark