apache-spark tutorials and guides

Is there a way to rewrite Spark RDD distinct to use mapPartitions instead of distinct?

Oct 19, 2022

how to build a graph from tuples in graphx and label the nodes after ?

Feb 14, 2018

scala serialization graph apache-spark

Why do Window functions fail with "Window function X does not take a frame specification"?

Oct 22, 2022

apache-spark pyspark apache-spark-sql window-functions pyspark-sql

howto add hive properties at runtime in spark-shell

Sep 06, 2019

apache-spark hive

How to submit code to a remote Spark cluster from IntelliJ IDEA

Jun 24, 2021

intellij-idea apache-spark

Spark Python error "FileNotFoundError: [WinError 2] The system cannot find the file specified"

Nov 30, 2019

python python-3.x apache-spark pyspark

What is the most efficient way to do a sorted reduce in PySpark?

Oct 14, 2022

python python-2.7 apache-spark mapreduce pyspark

Combining Spark Streaming + MLlib

Nov 16, 2022

python apache-spark pyspark spark-streaming apache-spark-mllib

Read Kafka topic in a Spark batch job

Nov 04, 2022

scala apache-spark apache-kafka spark-streaming kafka-consumer-api

PySpark: retrieve mean and the count of values around the mean for groups within a dataframe

May 15, 2019

python sql apache-spark apache-spark-sql window-functions

Running Spark on Linux : $JAVA_HOME not set error

Sep 14, 2022

linux apache-spark java-home ubuntu-16.04

Inspecting GraphX Graph Object

Feb 06, 2017

apache-spark spark-graphx

GroupByKey with datasets in Spark 2.0 using Java

Aug 11, 2022

java apache-spark group-by dataset apache-spark-2.0

Outlier detection algorithm spark mllib

May 31, 2022

apache-spark machine-learning apache-spark-mllib outliers

Hadoop Yarn: How to limit dynamic self allocation of resources with Spark?

Sep 07, 2022

hadoop apache-spark pyspark hadoop-yarn

How to make Spark driver resilient to Master restarts?

Oct 27, 2022

apache-spark apache-spark-standalone

spark: SAXParseException while writing to parquet on s3

Apr 26, 2022

scala hadoop apache-spark amazon-s3

How to use "cube" only for specific fields on Spark dataframe?

May 05, 2021

scala apache-spark dataframe apache-spark-sql cube

Spark: graphx api OOM errors after unpersist useless RDDs

Apr 26, 2022

apache-spark out-of-memory spark-graphx

How does back pressure property work in Spark Streaming?

Aug 17, 2022

hadoop apache-spark spark-streaming backpressure

New posts in apache-spark