apache-spark tutorials and guides

Apache Spark performance on AWS S3 vs EC2 HDFS

May 15, 2022

apache-spark

Merge two spark sql columns of type Array[string] into a new Array[string] column

Nov 15, 2022

scala apache-spark apache-spark-sql user-defined-functions

java.lang.IllegalArgumentException at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source) with Java 10

Sep 17, 2021

apache-spark pyspark

Spark MLLib Linear Regression model intercept is always 0.0?

Jun 11, 2019

scala apache-spark apache-spark-mllib

How to share Spark RDD between 2 Spark contexts?

May 15, 2022

apache-spark rdd

Scala code crashing with java.util.NoSuchElementException: next on empty iterator

Aug 10, 2022

scala apache-spark

How can we JOIN two Spark SQL dataframes using a SQL-esque "LIKE" criterion?

Oct 19, 2022

python apache-spark apache-spark-sql pyspark

Why does Spark save Map phase output to local disk?

Apr 28, 2022

apache-spark mapreduce rdd

Any way to access methods from individual stages in PySpark PipelineModel?

Aug 30, 2022

python apache-spark pyspark apache-spark-mllib apache-spark-ml

Apply a custom function to a spark dataframe group

Oct 29, 2022

apache-spark dataframe group-by dataset pyspark

Spark SQL and MySQL- SaveMode.Overwrite not inserting modified data

Oct 29, 2022

mysql apache-spark dataframe apache-spark-sql

How to choose the queue for Spark job using spark-submit?

Feb 12, 2022

apache-spark hadoop-yarn

Spark scala data frame udf returning rows

Feb 04, 2022

scala apache-spark user-defined-functions

How to create SQLContext in spark using scala?

Oct 28, 2022

scala apache-spark sbt apache-spark-sql

Spark (JAVA) - dataframe groupBy with multiple aggregations?

Nov 10, 2022

java apache-spark

Spark mapWithState API explanation

Nov 06, 2022

scala apache-spark spark-streaming

Why spark tell me “ name 'sqlContext' is not defined ”, how can I use sqlContext?

Feb 04, 2022

apache-spark apache-spark-sql

How to convert JavaPairInputDStream into DataSet/DataFrame in Spark

Feb 12, 2022

java apache-spark streaming apache-kafka spark-streaming

Why does spark-shell fail with "'""C:\Program' is not recognized as an internal or external command" on Windows?

Oct 01, 2022

windows apache-spark

How to zip two array columns in Spark SQL

Sep 16, 2022

python pandas apache-spark pyspark apache-spark-sql

New posts in apache-spark