apache-spark tutorials and guides

How to select multiple columns of dataset, given a list of column names?

May 08, 2022

java apache-spark apache-spark-sql

Spark decimal type precision loss

Jun 16, 2022

scala apache-spark apache-spark-sql

Comparison of a `float` to `np.nan` in Spark Dataframe

Sep 07, 2022

python numpy apache-spark pyspark nan

How do I get a spark dataframe to print it's explain plan to a string

Nov 17, 2022

scala apache-spark dataframe

How to find the max String length of a column in Spark using dataframe?

Sep 15, 2022

scala apache-spark apache-spark-sql

Spark: How to aggregate/reduce records based on time difference?

Sep 15, 2022

dataframe apache-spark pyspark apache-spark-sql rdd

Reading Excel (.xlsx) file in pyspark

Nov 04, 2022

apache-spark pyspark spark-excel

What is the optimal way to read from multiple Kafka topics and write to different sinks using Spark Structured Streaming?

Aug 26, 2022

apache-spark pyspark apache-kafka spark-structured-streaming

Elasticsearch for spark 3.0

Feb 20, 2022

apache-spark elasticsearch

"'JavaPackage' object is not callable" error executing explain() in Pyspark 3.0.1 via Zeppelin

Aug 29, 2022

apache-spark pyspark

Workaround for Scala RDD not being covariant

Oct 28, 2022

scala types covariance apache-spark

Apache Spark ALS Recommendation Rating values higher than range

Oct 31, 2022

apache-spark machine-learning apache-spark-mllib collaborative-filtering

Spark: Counting co-occurrence - Algorithm for efficient multi-pass filtering of huge collections

Mar 29, 2022

algorithm scala group-by apache-spark filtering

Joining two spark dataframes on time (TimestampType) in python

Oct 02, 2019

join apache-spark apache-spark-sql pyspark

write an RDD into HDFS in a spark-streaming context

Oct 18, 2022

scala hadoop apache-spark hdfs spark-streaming

Writing to Oracle Database using Apache Spark 1.4.0

May 21, 2019

oracle scala jdbc apache-spark

SPARK SQL Equivalent of Qualify + Row_number statements

Aug 17, 2022

sql apache-spark apache-spark-sql window-functions row-number

What does $( ) mean in Scala?

Nov 06, 2022

scala apache-spark

Iterated take() or batch processing for Spark?

May 11, 2021

apache-spark

Spark dataframes: Extract a column based on the value of another column

Jun 27, 2019

scala apache-spark dataframe apache-spark-sql

New posts in apache-spark