apache-spark tutorials and guides

Spark-Streaming hangs with kafka starting offset at earliest (Kafka 2, spark 2.4.3)

Mar 05, 2023

Refresh metadata for Dataframe while reading parquet file

Mar 05, 2023

apache-spark apache-spark-sql parquet apache-spark-dataset

Add a new column to a PySpark DataFrame from a Python list

Mar 04, 2023

python apache-spark pyspark apache-spark-sql

pandas_udf error RuntimeError: Result vector from pandas_udf was not the required length: expected 12, got 35

Mar 05, 2023

python apache-spark pyspark

What is the Difference between Broadcast hash join and Broadcast Nested loop join in Spark?

Mar 04, 2023

apache-spark

flattening array of struct in pyspark

Mar 05, 2023

apache-spark pyspark apache-spark-sql

How to write Kafka Producer in Scala

Mar 05, 2023

scala apache-spark apache-kafka kafka-producer-api

Azure Databricks, could not initialize class org.apache.spark.eventhubs.EventHubsConf

Mar 05, 2023

scala azure apache-spark databricks azure-databricks

How to use variables in SQL queries?

Mar 04, 2023

apache-spark apache-spark-sql databricks

Writing to Google Cloud Storage with v2 algorithm safe?

Mar 04, 2023

apache-spark apache-spark-sql google-cloud-storage

Populate a column based on previous value and row Pyspark

Mar 03, 2023

apache-spark pyspark apache-spark-sql

Spark explode array column to columns

Mar 04, 2023

java arrays apache-spark pyspark apache-spark-sql

What is RDD dependency in Spark?

Feb 13, 2023

apache-spark rdd

In spark SQL/Hive QL, How to select a column that is a reserved keyword

Feb 13, 2023

apache-spark hiveql apache-spark-sql

Error while trying to run Spark

Feb 12, 2023

linux git apache-spark

How to store and read data from Spark PairRDD

Feb 12, 2023

apache-spark

How to set offset committed by the consumer group using Spark's Direct Stream for Kafka?

Feb 12, 2023

java apache-spark apache-kafka spark-streaming

How to use BLAS library in Spark?

Feb 11, 2023

scala apache-spark blas

Return an RDD from takeOrdered, instead of a list

Feb 10, 2023

python apache-spark rdd

PySpark: Many features to Labeled Point RDD

Feb 11, 2023

apache-spark pyspark rdd apache-spark-mllib

New posts in apache-spark