apache-spark tutorials and guides

How to handle an AnalysisException on Spark SQL?

Sep 05, 2022

What does in-memory data storage mean in the context of Apache Spark?

Nov 09, 2022

hadoop apache-spark

In Apache Spark. How to set worker/executor's environment variables?

Oct 28, 2022

amazon-web-services amazon-s3 apache-spark distributed-computing

SparkSQL error Table Not Found

Oct 22, 2018

sql scala apache-spark cassandra

NoSuchMethodException in MaxMind GeoIp dependency jackson-databind built with mvn shade

Jun 06, 2018

scala maven apache-spark jackson maxmind

DBSCAN on spark : which implementation

Jul 30, 2019

scala apache-spark cluster-analysis apache-spark-mllib dbscan

What are the differences between sc.parallelize and sc.textFile?

Sep 30, 2021

apache-spark pyspark rdd

basedir must be absolute: ?/.ivy2/local

Sep 12, 2022

apache-spark pyspark ivy jupyterhub

Spark: Is "count" on Grouped Data a Transformation or an Action?

Apr 10, 2022

scala apache-spark

Saving result of DataFrame show() to string in pyspark

Sep 14, 2022

python apache-spark pyspark apache-spark-sql

java+spark: org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException

Oct 17, 2022

java serialization apache-spark

how to interpret RDD.treeAggregate

Oct 31, 2022

scala apache-spark rdd distributed-computing

PySpark DataFrame unable to drop duplicates

Oct 24, 2022

python apache-spark pyspark apache-spark-sql pyspark-sql

Parallelize / avoid foreach loop in spark

Jun 02, 2022

scala apache-spark foreach dataframe

Using spark-submit with python main

May 27, 2019

apache-spark pyspark

Apply a function to groupBy data with pyspark

Aug 23, 2022

apache-spark pyspark

PySpark - Creating a data frame from text file

Nov 07, 2022

python-2.7 apache-spark apache-spark-sql spark-dataframe pyspark-sql

PySpark DataFrame filter using logical AND over list of conditions -- Numpy All Equivalent

Nov 01, 2021

python numpy apache-spark pyspark apache-spark-sql

How to solve yarn container sizing issue on spark?

Oct 04, 2019

apache-spark pyspark hadoop-yarn

Dataframe transpose with pyspark in Apache Spark

Apr 10, 2022

python apache-spark dataframe pyspark transpose

New posts in apache-spark