apache-spark tutorials and guides

foreach function not working in Spark DataFrame

May 16, 2021

Dropping columns by data type in Scala Spark

Nov 11, 2022

scala apache-spark

Spark: unpersist RDDs for which I have lost the reference

Nov 11, 2022

scala apache-spark

Redirect Spark console logs into a file

Sep 05, 2022

apache-spark

How to expire state of dropDuplicates in structured streaming to avoid OOM?

Nov 21, 2022

apache-spark duplicates apache-spark-sql out-of-memory spark-structured-streaming

Workaround for importing spark implicits everywhere

Sep 02, 2020

scala apache-spark spark-dataframe apache-spark-2.0 implicits

spark-submit Error: No main class set in JAR; please specify one with --class

Jun 08, 2022

apache-spark

java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V

Feb 20, 2022

maven hadoop apache-spark intellij-idea

Does Kryo help in SparkSQL?

Sep 16, 2022

apache-spark apache-spark-sql kryo

StackOverflowError when operating with a large number of columns in Spark

Aug 26, 2022

scala apache-spark mapreduce spark-dataframe stack-overflow

How to write a Dataset to Kafka topic?

Oct 25, 2022

scala apache-spark apache-kafka apache-spark-sql

how to use spark lag and lead over group by and order by

Nov 15, 2022

apache-spark apache-spark-sql apache-spark-dataset

overwrite column values using other column values based on conditions pyspark

Sep 05, 2022

apache-spark pyspark

Spark csv reading speed is very slow although I increased the number of nodes

Jan 30, 2022

scala csv apache-spark hadoop google-compute-engine

outlier detection in pyspark

Feb 03, 2022

python-3.x apache-spark pyspark

Apache Spark and Nifi Integration

Oct 27, 2022

apache-spark apache-nifi

Group by column "grp" and compress DataFrame - (take last not null value for each column ordering by column "ord")

Feb 18, 2022

scala apache-spark aggregate-functions aggregation

Adding a new column in the first ordinal position in a pyspark dataframe

Mar 06, 2022

python apache-spark pyspark apache-spark-sql

Spark RDD partition by key in exclusive way

Aug 23, 2022

apache-spark pyspark rdd

Pyspark Error:- dataType <class 'pyspark.sql.types.StringType'> should be an instance of <class 'pyspark.sql.types.DataType'>

Nov 10, 2022

python apache-spark pyspark apache-spark-sql

New posts in apache-spark