apache-spark tutorials and guides

local class incompatible Exception: when running spark standalone from IDE

Nov 02, 2022

java apache-spark

Disadvantages of Spark Dataset over DataFrame

Sep 24, 2022

apache-spark

Why spark.ml don't implement any of spark.mllib algorithms?

Sep 17, 2022

machine-learning apache-spark pyspark apache-spark-mllib apache-spark-ml

Preserve index-string correspondence spark string indexer

Apr 04, 2016

python apache-spark apache-spark-sql pyspark apache-spark-ml

How can set the default spark logging level?

Aug 11, 2022

apache-spark pyspark

Meaning of Apache Spark warning "Calling spill() on RowBasedKeyValueBatch"

Oct 06, 2022

apache-spark pyspark warnings

Why is dataset.count causing a shuffle! (spark 2.2)

Mar 30, 2022

scala apache-spark spark-dataframe rdd

Extract information from a `org.apache.spark.sql.Row`

Nov 13, 2022

scala apache-spark apache-spark-sql

What is the right way to save\load models in Spark\PySpark

Oct 17, 2022

python apache-spark pyspark apache-spark-mllib

How to run independent transformations in parallel using PySpark?

Sep 17, 2022

python-2.7 apache-spark pyspark apache-spark-sql python-multiprocessing

How to limit functions.collect_set in Spark SQL?

Aug 20, 2022

apache-spark apache-spark-sql

Airflow SparkSubmitOperator - How to spark-submit in another server

Sep 17, 2022

apache-spark hadoop airflow

Why does Spark RDD partition has 2GB limit for HDFS?

Sep 05, 2022

scala apache-spark rdd

How to mount S3 bucket on Kubernetes container/pods?

Aug 18, 2022

apache-spark amazon-s3 kubernetes fuse s3fs

Why spark application fail with "executor.CoarseGrainedExecutorBackend: Driver Disassociated"?

Oct 31, 2022

apache-spark apache-spark-sql

spark ssc.textFileStream is not streamining any files from directory

Oct 19, 2018

filesystems apache-spark spark-streaming data-stream

What's the difference between spark.eventLog.dir and spark.history.fs.logDirectory?

Sep 17, 2022

apache-spark

How to convert DataFrame to Dataset in Apache Spark in Java?

Oct 24, 2017

java apache-spark spark-dataframe apache-spark-dataset

How to subtract a column of days from a column of dates in Pyspark?

Sep 17, 2022

python apache-spark pyspark apache-spark-sql user-defined-functions

Write DataFrame to mysql table using pySpark

Oct 22, 2020

python mysql apache-spark pyspark apache-spark-sql

New posts in apache-spark