apache-spark tutorials and guides

get size of parquet file in HDFS for repartition with Spark in Scala

Oct 15, 2022

Spark on Java - What is the right way to have a static object on all workers

Mar 30, 2022

java static apache-spark

DataFrame explode list of JSON objects

Oct 15, 2022

scala apache-spark apache-spark-sql distributed-computing

EMR spark-shell not picking up jars

Nov 08, 2022

amazon-s3 apache-spark emr

What happens if the data can't fit in memory with cache() in Spark?

Feb 07, 2022

apache-spark cluster-computing distributed-computing

Memory issue when importing parquet files in Spark

Jan 24, 2022

scala apache-spark apache-spark-sql parquet

Is it possible to obtain specific message offset in Kafka+SparkStreaming?

Oct 29, 2022

apache-spark apache-kafka spark-streaming kafka-consumer-api

OneHotEncoder in Spark Dataframe in Pipeline

Aug 20, 2022

scala apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml

How to plot ROC curve and precision-recall curve from BinaryClassificationMetrics

Sep 16, 2022

apache-spark machine-learning apache-spark-mllib

Spark on YARN too less vcores used

Aug 26, 2022

apache-spark hadoop-yarn hortonworks-data-platform resource-management

Java FlatMapFunction in Spark: error: is not abstract and does not override abstract method call(String) in FlatMapFunction

Jun 02, 2019

java apache-spark

How to use User Defined Types in Spark 2.0?

Oct 28, 2022

scala apache-spark user-defined-types

How to create encoder for custom Java objects?

May 15, 2022

java apache-spark apache-spark-2.0

How to partition Spark RDD when importing Postgres using JDBC?

Oct 29, 2022

postgresql jdbc apache-spark pyspark rdd

Using typesafe config with Spark on Yarn

Jun 25, 2022

scala apache-spark hadoop-yarn typesafe-config

How to avoid boxing bytes in array in custom datasource?

Dec 15, 2018

scala apache-spark apache-spark-sql

Spark: grouping rows in array by key

Oct 16, 2022

scala hadoop apache-spark

Converting mysql table to spark dataset is very slow compared to same from csv file

Oct 15, 2022

java mysql apache-spark jdbc amazon-s3

Pyspark: cast array with nested struct to string

Oct 19, 2022

python sql apache-spark pyspark spark-dataframe

Modify spark DataFrame column

Aug 19, 2022

apache-spark dataframe

New posts in apache-spark