apache-spark tutorials and guides

How to configure Hive to use Spark?

Dec 06, 2022

How to execute spark-shell from file with nohup?

Dec 06, 2022

apache-spark

How to use SQL query to define table in dbtable?

Dec 05, 2022

jdbc apache-spark apache-spark-sql

How to create an empty dataFrame in Spark

Dec 06, 2022

scala apache-spark apache-spark-sql avro spark-avro

Pyspark random forest feature importance mapping after column transformations

Dec 05, 2022

apache-spark pyspark apache-spark-sql apache-spark-mllib

Describe a Dataframe on PySpark

Dec 06, 2022

python pandas apache-spark pyspark

Why does spark-ec2 fail with ERROR: Could not find any existing cluster?

Dec 04, 2022

amazon-web-services amazon-ec2 apache-spark

Using scala to dump result processed by Spark to HDFS

Dec 05, 2022

scala hadoop hdfs apache-spark

Serializing RDD

Dec 05, 2022

java apache-spark rdd

Creating Spark application using wrong Scala version

Dec 05, 2022

scala apache-spark sbt

How to calculate cumulative sum using sqlContext

Dec 05, 2022

python apache-spark pyspark apache-spark-sql

Filter spark/scala dataframe if column is present in set

Dec 05, 2022

scala apache-spark filter spark-dataframe

How to filter Spark dataframe if one column is a member of another column

Dec 05, 2022

scala apache-spark dataframe apache-spark-sql

java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics

Dec 05, 2022

hadoop apache-spark

How compute the percentile in PySpark dataframe for each key?

Dec 05, 2022

python apache-spark pyspark apache-spark-sql percentile

How to solve pyspark `org.apache.arrow.vector.util.OversizedAllocationException` error by increasing spark's memory?

Dec 05, 2022

apache-spark pyspark user-defined-functions apache-arrow

Dividing two columns of a different DataFrames

Dec 04, 2022

python apache-spark pyspark apache-spark-sql

Dataframe from List<String> in Java

Dec 04, 2022

java apache-spark spark-dataframe

How to handle exceptions in Spark and Scala

Dec 04, 2022

scala apache-spark exception-handling

Concat multiple columns of a dataframe using pyspark

Dec 04, 2022

apache-spark pyspark apache-spark-sql

New posts in apache-spark