apache-spark tutorials and guides

How do I use Spark ORC indexes?

Sep 06, 2022

apache-spark orc

Get a registered Spark Accumulator by name

Aug 25, 2022

scala apache-spark

Pyspark: spark-submit not working like CLI

Oct 20, 2022

apache-spark pyspark

PySpark SparkSession Builder with Kubernetes Master

Dec 21, 2019

apache-spark pyspark kubernetes jupyter

Outer join two Datasets (not DataFrames) in Spark Structured Streaming

Aug 28, 2022

scala apache-spark apache-spark-sql spark-structured-streaming

In Spark ML, why is fitting a StringIndexer on a column with million of disctinct values yielding an OOM error?

Oct 24, 2022

apache-spark pyspark apache-spark-ml

Spark Strucutured Streaming Window on non-timestamp column

Sep 13, 2022

scala apache-spark spark-streaming aggregate-functions spark-structured-streaming

Access AWS Glue from local Spark

May 15, 2022

amazon-web-services apache-spark apache-spark-sql aws-glue

PySpark: Deserializing an Avro serialized message contained in an eventhub capture avro file

May 12, 2020

apache-spark pyspark avro azure-eventhub-capture

How to get the table name from Spark SQL Query [PySpark]?

Apr 12, 2022

python sql scala apache-spark pyspark

Fastest way to take elementwise sum of two Lists

Nov 14, 2022

scala list performance apache-spark elementwise-operations

Spark and Hive in Hadoop 3: Difference between metastore.catalog.default and spark.sql.catalogImplementation

Sep 16, 2022

apache-spark hadoop hive hive-metastore hadoop3

How to convert a struct field in a Row to an avro record in Spark Java

Sep 06, 2022

java apache-spark avro spark-avro

High Concurrency Clusters in Databricks

May 29, 2022

scala apache-spark databricks

Cassandra + Solr/Hadoop/Spark - Choosing the right tools

Nov 09, 2022

hadoop solr cassandra analytics apache-spark

Spark Sql JDBC Support

Nov 21, 2022

apache-spark

How to convert scala.collection.Set to java.util.Set with serializable within an RDD

Aug 19, 2020

java serialization apache-spark scala-2.9 rdd

Spark Streaming groupByKey and updateStateByKey implementation

Apr 03, 2022

scala apache-spark spark-streaming

Spark SQL performance

Nov 02, 2022

java hbase apache-spark rdd apache-spark-sql

Using PartitionBy to split and efficiently compute RDD groups by Key

Nov 19, 2018

apache-spark rdd

New posts in apache-spark