apache-spark tutorials and guides

Failed to start master for Spark in Windows

Sep 16, 2022

apache-spark windows-10

How to exit spark-submit after the submission

Mar 25, 2022

apache-spark hadoop-yarn

Spark Random Forests: Different results with same seed

Oct 25, 2022

scala apache-spark machine-learning random-forest

Does Spark support Partition Pruning with Parquet Files

Sep 13, 2022

apache-spark amazon-s3 hive parquet

Spark Kafka Direct DStream - How many executors and RDD partitions in yarn-cluster mode if num-executors is set?

Sep 07, 2022

apache-spark apache-kafka spark-streaming

Spark: efficiency of dataframe checkpoint vs. explicitly writing to disk

Aug 30, 2022

scala apache-spark apache-spark-sql

Why does Spark's OneHotEncoder drop the last category by default?

Aug 29, 2022

apache-spark machine-learning pyspark one-hot-encoding bigdata

Does collect_list() maintain relative ordering of rows?

Nov 17, 2022

scala apache-spark apache-spark-sql

org.apache.spark.SparkException: Job aborted due to stage failure: Task from application

Sep 26, 2022

apache-spark

"sparkContext was shut down" while running spark on a large dataset

Sep 28, 2022

scala apache-spark hadoop-yarn apache-spark-sql

Total size of serialized results of tasks is bigger than spark.driver.maxResultSize

Sep 14, 2022

apache-spark pyspark

Spark 2.0 deprecates 'DirectParquetOutputCommitter', how to live without it?

Jan 23, 2022

hadoop apache-spark amazon-s3 amazon-emr parquet

What is the best way to remove accents with Apache Spark dataframes in PySpark?

Sep 12, 2022

python apache-spark pyspark apache-spark-sql unicode-normalization

Hash function in spark

Sep 08, 2022

scala apache-spark hash apache-spark-sql

Spark - Which instance type is preferred for AWS EMR cluster? [closed]

Sep 12, 2022

amazon-ec2 apache-spark emr

Spark losing println() on stdout

May 07, 2020

scala apache-spark println accumulator

How to stop a running SparkContext before opening the new one

Sep 08, 2022

scala apache-spark

How to merge multiple feature vectors in DataFrame?

Sep 12, 2022

apache-spark machine-learning apache-spark-sql apache-spark-ml

Spark train test split

Oct 18, 2022

apache-spark apache-spark-mllib train-test-split

Stopping a Running Spark Application

Feb 28, 2022

apache-spark

New posts in apache-spark