apache-spark tutorials and guides

GC Logs Overwritten when JVM Crashes

Jan 30, 2022

Spark Structured Streaming Checkpoint Compatibility

Mar 26, 2022

apache-spark apache-kafka spark-streaming spark-structured-streaming

What can cause a stage to reattempt in Spark

Nov 11, 2022

scala apache-spark

Zeppelin does not display stack trace

Sep 17, 2021

apache-spark apache-zeppelin

Using .where() on pyspark.sql.functions.max().over(window) on Spark 2.4 throws Java exception

Aug 22, 2022

apache-spark exception pyspark apache-spark-sql

Rerun Scala code with -deprecation using Apache Zeppelin

Mar 29, 2022

scala apache-spark apache-zeppelin

one-hot encode of multiple string categorical features using Spark DataFrames

Jun 21, 2022

python apache-spark pyspark apache-spark-sql bigdata

Getting error while reading from S3 server using pyspark : [java.lang.IllegalArgumentException]

Mar 01, 2022

python apache-spark amazon-s3 pyspark

Spark/k8s: How to run spark submit on Kubernetes with client mode

Apr 30, 2022

docker apache-spark kubernetes

Aggregate while dropping duplicates in pyspark

Jul 02, 2022

dataframe apache-spark pyspark apache-spark-sql databricks

Spark not ignoring empty partitions

Sep 27, 2022

performance apache-spark amazon-s3 partitioning parquet

Low parallelism when running Apache Beam wordcount pipeline on Spark with Python SDK

Jul 02, 2022

python apache-spark apache-beam

How to run a Spark-java program from command line [closed]

Aug 26, 2022

hadoop hdfs apache-spark

Apache Spark Throws java.lang.IllegalStateException: unread block data

Aug 07, 2021

scala hadoop hdfs apache-spark

Spark Standalone Mode multiple shell sessions (applications)

Jul 02, 2022

apache-spark

Specifying the output file name in Apache Spark

Aug 25, 2022

python apache-spark

Spark - convert string IDs to unique integer IDs

Jan 26, 2022

apache-spark

Usage of local variables in closures when accessing Spark RDDs

Mar 26, 2022

closures apache-spark rdd pyspark

How do you read and write from/into different ElasticSearch clusters using spark and elasticsearch-hadoop?

Nov 12, 2022

apache-spark elasticsearch hdfs elasticsearch-hadoop distributed-filesystem

How to format data for the spark mlib kmeans clustering algorithm?

Nov 06, 2022

java algorithm machine-learning apache-spark

New posts in apache-spark