apache-spark tutorials and guides

Spark: shuffle operation leading to long GC pause

Sep 05, 2022

Why does transform do side effects (println) only once in Structured Streaming?

Aug 24, 2022

scala apache-spark apache-spark-sql spark-structured-streaming

Issues with Logistic Regression for multiclass classification using PySpark

Oct 04, 2022

apache-spark pyspark apache-spark-mllib logistic-regression apache-spark-ml

Need to Know Partitioning Details in Dataframe Spark

Nov 18, 2019

apache-spark apache-spark-sql spark-dataframe

Is Hive faster than Spark?

Nov 11, 2022

hadoop apache-spark hive apache-tez bigdata

How to use Spark-Scala to download a CSV file from the web?

Jul 18, 2022

scala csv apache-spark

turning pandas to pyspark expression

Aug 23, 2022

python pandas apache-spark group-by pyspark

Zeppelin java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$

Aug 14, 2021

macos apache-spark apache-zeppelin

Apache Spark - Dataset operations fail in abstract base class?

Aug 30, 2019

scala apache-spark abstract-class

Sort by date an Array of a Spark DataFrame Column

May 03, 2022

scala apache-spark dataframe apache-spark-sql

Scala + SBT - How to configure reference.conf for a shaded Akka library

Feb 10, 2021

apache-spark akka cloudera-cdh sbt-assembly shading

Processing (OSM) PBF files in Spark

Feb 22, 2022

scala apache-spark amazon-emr osm.pbf

Using stat.bloomFilter in Spark 2.0.0 to filter another dataframe

Dec 06, 2021

scala apache-spark apache-spark-sql apache-spark-dataset bloom-filter

Spark SQL "Limit"

Oct 28, 2019

hadoop apache-spark hive hortonworks-data-platform

spark-submit config through file

Jan 19, 2020

apache-spark spark-submit

Scala/ Spark- Multiply an Integer with each value in a Dataframe Column

Nov 08, 2022

scala apache-spark

How to enable Tungsten optimization in Spark 2?

Oct 25, 2019

apache-spark pyspark apache-spark-sql apache-spark-2.0

Retrieve Spark Mllib StringIndexer column mapping

Nov 24, 2019

scala apache-spark apache-spark-mllib apache-spark-ml

Efficient way to join a cached spark dataframe with other and cache again

Nov 04, 2022

caching apache-spark dataframe union

Is it the driver or the workers who reads the text file when sc.textfile is used?

May 01, 2022

scala file hadoop apache-spark io

New posts in apache-spark