apache-spark tutorials and guides

Low cpu usage while running a spark job

Oct 28, 2022

java apache-spark cpu-usage

How to use a predicate while reading from JDBC connection?

Mar 19, 2022

r apache-spark jdbc sparklyr

using DataSet.repartition in Spark 2 - several tasks handle more than one partition

Nov 01, 2022

apache-spark spark-streaming apache-spark-dataset

Does CrossValidator in PySpark distribute the execution?

Oct 17, 2022

apache-spark machine-learning parameters pyspark

Spark, Scala - How to get Top 3 value from each group of two column in dataframe [duplicate]

Jul 03, 2022

scala apache-spark apache-spark-sql

PATH issue: Could not find valid SPARK_HOME while searching

Jan 25, 2020

ubuntu apache-spark path

How to (equally) partition array-data in spark dataframe

Aug 23, 2022

scala apache-spark

Spark UDF not running in parallel

Aug 22, 2022

python apache-spark pyspark databricks

Spark window function on dataframe with large number of columns

Aug 28, 2022

apache-spark spark-dataframe

Passing multiple system properties to google dataproc cluster job

Aug 22, 2022

apache-spark google-cloud-platform gcloud google-cloud-dataproc

What is the difference between a "stateful" and "stateless" system?

Oct 15, 2022

apache-spark streaming spark-streaming state apache-flink

Spark Structured Streaming app has no jobs and no stages

Oct 30, 2022

apache-spark apache-kafka spark-structured-streaming

Spark Structured Streaming Blue/Green Deployments

Nov 13, 2022

apache-spark hadoop deployment spark-structured-streaming blue-green-deployment

Error handling with Try match inside an udf - and log row where it failed

Nov 06, 2022

scala apache-spark dataframe error-handling user-defined-functions

Spark pivot groupby performance very slow

Dec 10, 2021

apache-spark dataframe group-by pivot

Recommended way to access HBase using Scala

Oct 17, 2022

scala apache-spark hbase apache-flink scalding

Pyspark sql: Create a new column based on whether a value exists in a different DataFrame's column

Sep 05, 2022

python apache-spark pyspark pyspark-sql

Xml processing in Spark

Aug 22, 2022

apache-spark

How to pass variables in spark SQL, using python?

Aug 23, 2022

python apache-spark pyspark apache-spark-sql

Difference when serializing a lazy val with or without @transient

Sep 06, 2022

scala serialization apache-spark lazy-initialization transient

New posts in apache-spark