apache-spark tutorials and guides

Get the size/length of an array column

Aug 29, 2022

scala apache-spark apache-spark-sql

What is RDD in spark

Oct 16, 2022

scala hadoop apache-spark rdd

spark dataframe drop duplicates and keep first

Aug 29, 2022

apache-spark dataframe duplicates pyspark apache-spark-sql

spark 2.1.0 session config settings (pyspark)

Aug 29, 2022

python apache-spark pyspark spark-dataframe

What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters?

Jun 07, 2018

apache-spark jdbc apache-spark-sql

Pyspark: Parse a column of json strings

Aug 29, 2022

python json apache-spark pyspark

What is the difference between Apache Spark SQLContext vs HiveContext?

Sep 30, 2022

apache-spark hive apache-spark-sql

Spark RDD to DataFrame python

Aug 28, 2022

python apache-spark pyspark spark-dataframe

Efficient Count Distinct with Apache Spark

Aug 28, 2022

distinct apache-spark

Spark extracting values from a Row

Aug 28, 2022

scala apache-spark apache-spark-sql

FetchFailedException or MetadataFetchFailedException when processing big data set

Aug 28, 2022

apache-spark hadoop-yarn

How to debug Spark application locally?

Aug 28, 2022

apache-spark

How do I unit test PySpark programs?

Oct 21, 2022

python unit-testing apache-spark pyspark

Joining Spark dataframes on the key

Aug 28, 2022

scala apache-spark dataframe apache-spark-sql

Spark 1.4 increase maxResultSize memory

Aug 28, 2022

python memory apache-spark pyspark jupyter

How to handle categorical features with spark-ml?

Aug 28, 2022

apache-spark categorical-data apache-spark-ml apache-spark-mllib

Filtering a Pyspark DataFrame with SQL-like IN clause

Sep 05, 2022

python sql apache-spark dataframe pyspark

What is a task in Spark? How does the Spark worker execute the jar file?

Aug 28, 2022

apache-spark distributed-computing

Difference between DataSet API and DataFrame API [duplicate]

Sep 12, 2022

dataframe apache-spark apache-spark-sql rdd apache-spark-dataset

Application report for application_ (state: ACCEPTED) never ends for Spark Submit (with Spark 1.2.0 on YARN)

Nov 16, 2022

apache-spark hadoop-yarn amazon-emr amazon-kinesis

New posts in apache-spark