apache-spark tutorials and guides

Spark ML indexer cannot resolve DataFrame column name with dots?

Jan 02, 2019

Application attempt appattempt_*** doesn't exist in ApplicationMasterService cache

Jul 31, 2019

apache-spark

How to speed up Spark SQL unit tests?

Sep 17, 2022

unit-testing testing apache-spark apache-spark-sql

Why is Spark performing worse when using Kryo serialization?

Sep 17, 2022

scala performance apache-spark avro kryo

Spark 1.6: java.lang.IllegalArgumentException: spark.sql.execution.id is already set

Jul 11, 2022

scala apache-spark apache-spark-sql spark-dataframe

Comparison between fasttext and LDA

Sep 17, 2022

facebook scala apache-spark

How do you create merge_asof functionality in PySpark?

Sep 17, 2022

python pandas apache-spark pyspark apache-spark-sql

Spark - java IOException :Failed to create local dir in /tmp/blockmgr*

Jan 04, 2022

hadoop apache-spark apache-spark-sql

pyspark using one task for mapPartitions when converting rdd to dataframe

Sep 17, 2022

python apache-spark pyspark apache-spark-sql

Spark is only using one worker machine when more are available

Aug 19, 2022

python apache-spark pyspark

If I cache a Spark Dataframe and then overwrite the reference, will the original data frame still be cached?

Sep 17, 2022

python apache-spark pyspark apache-spark-sql

Output from Dataproc Spark job in Google Cloud Logging

Sep 08, 2022

apache-spark google-cloud-dataproc google-cloud-logging

Read and write empty string "" vs NULL in Spark 2.0.1

Sep 17, 2022

csv apache-spark

Apache Spark - Dealing with Sliding Windows on Temporal RDDs

Nov 20, 2022

algorithm scala apache-spark

Caching intermediate results in Spark ML pipeline

Sep 17, 2022

apache-spark apache-spark-ml

What is the correct way to start/stop spark streaming jobs in yarn?

Sep 29, 2022

hadoop apache-spark spark-streaming hadoop-yarn cloudera

Spark Java Error: Size exceeds Integer.MAX_VALUE

Sep 12, 2020

java python apache-spark distributed-computing logistic-regression

Dealing with a large gzipped file in Spark

Oct 20, 2022

apache-spark gzip amazon-emr

Extract document-topic matrix from Pyspark LDA Model

Sep 16, 2022

python apache-spark pyspark lda

local class incompatible Exception: when running spark standalone from IDE

Nov 02, 2022

java apache-spark

New posts in apache-spark