apache-spark tutorials and guides

Cumulative distinct count with Spark SQL

Nov 08, 2022

sql apache-spark apache-spark-sql

pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuild in windows 10

Sep 06, 2022

apache-spark pyspark

How handle categorical features in the latest Random Forest in Spark?

Sep 03, 2022

apache-spark apache-spark-mllib random-forest apache-spark-ml feature-engineering

Why is difference between sqlContext.read.load and sqlContext.read.text?

Sep 15, 2022

apache-spark pyspark apache-spark-sql spark-csv

Which would be a quicker (and better) tool for querying data stored in the Parquet format - Spark SQL, Athena or ElasticSearch?

Aug 21, 2022

performance apache-spark elasticsearch etl amazon-athena

How does Serialized RDD occupy less space in memory?

Feb 22, 2022

java apache-spark serialization

Error: Could not write class iw because it exceeds JVM code size limits. Method code too large

May 25, 2019

scala apache-spark apache-spark-sql

Scala: How to combine two data frames?

Aug 19, 2022

scala apache-spark apache-spark-sql

How to implement `except` in Apache Spark based on subset of columns?

Sep 15, 2022

scala apache-spark apache-spark-sql

how to convert a timestamp into string (without changing timezone)?

Jul 14, 2022

r apache-spark hive timestamp sparklyr

update a dataframe column with new values

Oct 29, 2022

apache-spark pyspark

How YARN knows data locality in Apache spark in cluster mode

Oct 19, 2022

apache-spark hadoop-yarn

How do I run Spark jobs concurrently in the same AWS EMR cluster ?

Jul 12, 2022

amazon-web-services apache-spark hadoop-yarn amazon-emr livy

S3 Slow Down exception for Spark program [duplicate]

Oct 15, 2022

apache-spark amazon-s3

Spark Dataframe upsert to Elasticsearch

May 18, 2022

scala apache-spark dataframe elasticsearch

How to cast an array of struct in a spark dataframe using selectExpr?

Mar 12, 2021

sql scala apache-spark dataframe apache-spark-sql

can't resolve ... given input columns

Sep 15, 2021

apache-spark pyspark apache-spark-sql

Spark DataFrame is Untyped vs DataFrame has schema?

Oct 19, 2022

apache-spark apache-spark-sql bigdata

Spark dataframe column naming conventions / restrictions

Feb 06, 2021

apache-spark hive pyspark naming-conventions amazon-athena

Extract and Visualize Model Trees from Sparklyr

Sep 01, 2021

r apache-spark random-forest decision-tree sparklyr

New posts in apache-spark