apache-spark tutorials and guides

Is there a way to change the replication factor of RDDs in Spark?

Jun 30, 2022

How to compare multiple rows?

Nov 06, 2019

scala apache-spark spark-streaming apache-spark-sql

Sending Large CSV to Kafka using python Spark

Mar 19, 2022

python apache-spark apache-kafka pyspark kafka-python

Using groupBy in Spark and getting back to a DataFrame

Nov 02, 2022

scala apache-spark apache-spark-sql

Add Yarn cluster configuration to Spark application

Jun 09, 2019

scala hadoop apache-spark hadoop-yarn

How to pass additional parameters to user-defined methods in pyspark for filter method?

Dec 31, 2021

python apache-spark pyspark

How to read parquet files using `ssc.fileStream()`? What are the types passed to `ssc.fileStream()`?

May 18, 2021

scala hadoop apache-spark spark-streaming hadoop2

Replace new line (\n) character in csv file - spark scala

Oct 21, 2022

scala replace apache-spark character newline

Why are "sc.addFile" and "spark-submit --files" not distributing a local file to all workers?

Aug 21, 2021

file apache-spark cluster-computing distribute

How can I read in a binary file from hdfs into a Spark dataframe?

Sep 07, 2022

python hadoop numpy apache-spark spark-dataframe

How to get date and time from string?

Dec 06, 2018

scala date apache-spark apache-spark-sql

Conflict between httpclient version and Apache Spark

Jan 05, 2021

java apache-spark amazon-ec2 apache-httpclient-4.x

pyspark expected zero arguments for construction of ClassDict (for pyspark.mllib.linalg.DenseVector)

Dec 09, 2021

apache-spark pyspark apache-spark-sql user-defined-functions apache-spark-mllib

Install Spark on an existing Hadoop cluster

Sep 27, 2022

linux hadoop apache-spark

How to register S3 Parquet files in a Hive Metastore using Spark on EMR

Nov 15, 2022

apache-spark hive elastic-map-reduce apache-spark-1.6

create hive external table with schema in spark

Nov 14, 2021

apache-spark hive apache-spark-sql spark-avro

Pyspark command not recognised

May 02, 2022

python apache-spark pyspark

Scala: How to get a range of rows in a dataframe

Nov 18, 2022

scala apache-spark dataframe

PYSPARK : casting string to float when reading a csv file

Nov 03, 2022

python apache-spark pyspark

Creating a Spark DataFrame from a single string

Aug 23, 2022

scala apache-spark spark-dataframe

New posts in apache-spark