apache-spark tutorials and guides

How to score all user-product combinations in Spark MatrixFactorizationModel?

Nov 02, 2017

apache-spark apache-spark-mllib matrix-factorization

Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode

Nov 14, 2019

apache-spark hadoop hadoop-yarn alluxio

Spark can't pickle method_descriptor

Sep 04, 2020

python hbase apache-spark pickle happybase

In-order processing in Spark Streaming

Nov 02, 2022

apache-spark spark-streaming

Spark-Shell: Howto define JAR loading order

Jun 20, 2022

scala apache-spark

Lambda Architecture with Apache Spark

Aug 24, 2019

cassandra apache-spark apache-kafka lambda-architecture

Spark DataFrames with Parquet and Partitioning

Sep 08, 2019

apache-spark apache-spark-sql parquet

Spark metrics on wordcount example

Jul 07, 2018

apache-spark metrics

Spark: Input a vector

Apr 26, 2022

scala apache-spark apache-spark-mllib

Spark example program runs very slow

Aug 23, 2022

performance apache-spark pyspark transitive-closure

Data shuffle for Hive and Spark window function

Jan 20, 2020

python hadoop apache-spark hive pyspark

How to build a sparse matrix in PySpark?

Jul 12, 2020

python apache-spark pyspark sparse-matrix recommendation-engine

Kryo: deserialize old version of class

Aug 30, 2021

scala serialization apache-spark spark-streaming kryo

Group by and order by in Spark SQL

Oct 14, 2022

apache-spark apache-spark-sql

CodeGen grows beyond 64 KB error when normalizing large PySpark dataframe

Dec 09, 2021

apache-spark pyspark apache-spark-sql pyspark-sql window-functions

How to have Apache Spark running on GPU?

Apr 09, 2018

apache-spark cuda opencl gpu cpu

Read parquet into spark dataset ignoring missing fields [duplicate]

Dec 14, 2019

apache-spark apache-spark-sql parquet apache-spark-dataset apache-spark-2.0

How to get the number of records written (using DataFrameWriter's save operation)?

Nov 03, 2022

scala apache-spark apache-spark-sql

Spark - csv read option

Aug 25, 2022

apache-spark

YARN applications cannot start when specifying YARN node labels

Nov 11, 2022

hadoop apache-spark hadoop-yarn google-cloud-dataproc

New posts in apache-spark