apache-spark tutorials and guides

How to save Spark RDD to local filesystem

Aug 30, 2022

Will Spark SQL completely replace Apache Impala or Apache Hive?

Nov 07, 2022

sql hadoop apache-spark hive impala

Filter dataframe by value NOT present in column of other dataframe [duplicate]

Sep 14, 2022

scala apache-spark apache-spark-sql spark-dataframe

Pyspark read multiple csv files into a dataframe (OR RDD?)

May 06, 2022

python apache-spark pyspark spark-dataframe jupyter-notebook

how to handle millions of smaller s3 files with apache spark

Sep 12, 2022

amazon-web-services hadoop apache-spark amazon-s3

pyspark merge two rdd together

Nov 10, 2022

python apache-spark pyspark rdd

How to make onehotencoder in Spark to work like onehotencoder in Pandas?

Aug 25, 2022

apache-spark pyspark one-hot-encoding

How long does RDD remain in memory?

Apr 03, 2022

apache-spark rdd

Pyspark ML - How to save pipeline and RandomForestClassificationModel

Oct 23, 2022

apache-spark pyspark apache-spark-mllib

Efficient string suffix detection

Jul 16, 2022

python apache-spark pyspark apache-spark-sql string-matching

Spark / Scala: Passing RDD to Function

Feb 26, 2022

scala apache-spark rdd

Why do I have to explicitly tell Spark what to cache?

Oct 04, 2022

apache-spark caching

How to apply a function to a column of a Spark DataFrame?

Oct 26, 2022

scala apache-spark dataframe apache-spark-sql

How do I convert column of unix epoch to Date in Apache spark DataFrame using Java?

Nov 05, 2022

java apache-spark spark-dataframe

Query in Spark SQL inside an array

Dec 05, 2018

apache-spark apache-spark-sql spark-dataframe

Spark list all cached RDD names and unpersist

Feb 23, 2022

java scala dataframe apache-spark rdd

Request insufficient authentication scopes when running Spark-Job on dataproc

Oct 16, 2022

apache-spark google-cloud-platform google-cloud-dataproc

Unresolved reference while trying to import col from pyspark.sql.functions in python 3.5

Apr 06, 2022

python apache-spark pyspark pyspark-sql spark-structured-streaming

IllegalArgumentException thrown when count and collect function in spark

Jul 10, 2022

python macos apache-spark pyspark python-3.6

could not read data from json using pyspark

Nov 16, 2022

apache-spark pyspark

New posts in apache-spark