apache-spark tutorials and guides

When to use a UDF versus a function in PySpark? [duplicate]

Jun 25, 2022

How to apply large python model to pyspark-dataframe?

Sep 08, 2022

python apache-spark machine-learning pyspark pyspark-sql

Spark Caused by: java.lang.StackOverflowError Window Function?

Sep 06, 2022

python scala apache-spark pyspark

JDBC to Spark Dataframe - How to ensure even partitioning?

Sep 06, 2022

apache-spark jdbc apache-spark-sql partitioning

Pyspark Window function on entire data frame

Oct 04, 2022

dataframe apache-spark pyspark apache-spark-sql window-functions

Spark Structured Streaming with Kafka SASL/PLAIN authentication

Sep 06, 2022

apache-spark apache-kafka spark-structured-streaming

Job 65 cancelled because SparkContext was shut down

Dec 05, 2021

apache-spark hadoop pyspark apache-spark-sql apache-zeppelin

PySpark - pass a value from another column as the parameter of spark function

Oct 29, 2022

apache-spark pyspark apache-spark-sql

NoClassDefFoundError: org/apache/spark/sql/internal/connector/SimpleTableProvider when running in Dataproc

May 22, 2022

apache-spark sbt google-cloud-dataproc

PySpark data skewness with Window Functions

Sep 25, 2022

apache-spark pyspark

In spark, what does the parameter "minPartitions" works in SparkContext.textFile(path, minPartitions)?

Jun 20, 2022

apache-spark

How to query when connecting mongodb with apache-spark

Sep 24, 2022

mongodb hadoop apache-spark

Hadoop DistributedCache functionality in Spark

Aug 30, 2022

hadoop apache-spark distribute distributed-cache

Merge more than 32 files in Google Cloud Storage

Jan 06, 2020

google-cloud-storage apache-spark google-compute-engine

reduceByKey using Scala object as key

Dec 03, 2019

scala apache-spark reduce

launching a spark program using oozie workflow

Nov 18, 2022

scala apache-spark workflow oozie

custom join with non equal keys

Nov 10, 2021

join apache-spark

Ordering an RDD[String]

Aug 29, 2022

scala apache-spark

Apache Spark app workflow

Jun 24, 2022

apache-spark workflow

How to create collection of RDDs out of RDD?

Nov 13, 2022

scala apache-spark

New posts in apache-spark