apache-spark tutorials and guides

Apache Spark and Nifi Integration

Oct 27, 2022

apache-spark apache-nifi

Group by column "grp" and compress DataFrame - (take last not null value for each column ordering by column "ord")

Feb 18, 2022

scala apache-spark aggregate-functions aggregation

Adding a new column in the first ordinal position in a pyspark dataframe

Mar 06, 2022

python apache-spark pyspark apache-spark-sql

Spark RDD partition by key in exclusive way

Aug 23, 2022

apache-spark pyspark rdd

Pyspark Error:- dataType <class 'pyspark.sql.types.StringType'> should be an instance of <class 'pyspark.sql.types.DataType'>

Nov 10, 2022

python apache-spark pyspark apache-spark-sql

aws: EMR cluster fails "ERROR UserData: Error encountered while try to get user data" on submitting spark job

Aug 09, 2021

amazon-web-services apache-spark amazon-emr

How to use foreach or foreachBatch in PySpark to write to database?

Sep 24, 2022

apache-spark pyspark apache-kafka spark-structured-streaming

Why is repartition faster than partitionBy in Spark?

Sep 12, 2022

apache-spark pyspark apache-spark-sql apache-spark-xml

How to parallelize an RDD?

Sep 24, 2022

scala apache-spark

How to rename huge amount of files in Hadoop/Spark?

Nov 13, 2022

hadoop parallel-processing bigdata apache-spark

Spark - How to use the trained recommender model in production?

Sep 13, 2022

apache-spark mahout recommendation-engine mahout-recommender

Shuffled vs non-shuffled coalesce in Apache Spark

Aug 24, 2022

scala apache-spark distributed-computing

Change Iterable[(String, Double)] of an RDD to Array or List

Aug 21, 2022

scala apache-spark

Spark on embedded mode - user/hive/warehouse not found

Aug 31, 2022

hadoop apache-spark hive apache-spark-sql parquet

What happens if an RDD can't fit into memory in Spark? [duplicate]

Sep 02, 2021

scala hadoop apache-spark bigdata

How to upload files to new EMR cluster

Jun 19, 2022

python amazon-web-services apache-spark emr

pyspark split a column to multiple columns without pandas

Jun 01, 2022

python apache-spark pyspark apache-spark-sql

spark.storage.memoryFraction setting in Apache Spark

Oct 24, 2022

java python apache-spark hadoop-yarn

spark returns error libsnappyjava.so: failed to map segment from shared object: Operation not permitted

Mar 04, 2022

java hadoop apache-spark hive snappy

How to convert a sparse vector to dense in Scala Spark?

Sep 07, 2022

scala apache-spark apache-spark-mllib

New posts in apache-spark