apache-spark tutorials and guides

argmax in Spark DataFrames: how to retrieve the row with the maximum value

Aug 22, 2022

apache-spark apache-spark-sql

How can I save an RDD into HDFS and later read it back?

Mar 15, 2022

scala apache-spark hdfs rdd bigdata

How to get all columns after groupby on Dataset<Row> in spark sql 2.1.0

Sep 18, 2022

apache-spark apache-spark-sql

How to create a copy of a dataframe in pyspark?

Mar 20, 2022

python apache-spark pyspark apache-spark-sql

Encountering " WARN ProcfsMetricsGetter: Exception when trying to compute pagesize" error when running Spark

Feb 02, 2022

python apache-spark pyspark

Is there an "Explain RDD" in spark

May 11, 2018

apache-spark rdd

How to extract application ID from the PySpark context

Oct 19, 2022

apache-spark hadoop-yarn pyspark

Case class equality in Apache Spark

Apr 08, 2022

scala apache-spark pattern-matching rdd case-class

How to connect HBase and Spark using Python?

Oct 16, 2022

python apache-spark hbase pyspark apache-spark-sql

Writing files to local system with Spark in Cluster mode

Oct 02, 2022

scala hadoop apache-spark

How to filter one spark dataframe against another dataframe

Sep 18, 2022

scala apache-spark apache-spark-sql spark-dataframe

How do I collect a single column in Spark?

Oct 17, 2019

apache-spark dataframe pyspark apache-spark-sql

How to set the number of partitions/nodes when importing data into Spark

Aug 23, 2022

sql apache-spark database-partitioning pyspark-sql

Spark Error: Not enough space to cache partition rdd_8_2 in memory! Free memory is 58905314 bytes

Jul 31, 2021

scala out-of-memory apache-spark rdd

Spark when union a lot of RDD throws stack overflow error

Sep 18, 2022

apache-spark rdd

Spark SQL filter multiple fields

Nov 07, 2022

scala apache-spark apache-spark-sql

Use Spark to list all files in a Hadoop HDFS directory?

Sep 18, 2022

scala apache-spark hadoop

Apache Drill vs Spark [closed]

Sep 18, 2022

hadoop apache-spark bigdata apache-drill

Building a StructType from a dataframe in pyspark

Sep 23, 2022

python apache-spark dataframe pyspark apache-spark-sql

How to select last row and also how to access PySpark dataframe by index?

Oct 25, 2022

python apache-spark pyspark apache-spark-sql pyspark-sql

New posts in apache-spark