apache-spark-sql tutorials

Writing RDD partitions to individual parquet files in its own directory

Nov 01, 2022

Getting the first value from spark.sql.Row

Sep 13, 2022

apache-spark apache-spark-sql

UDF's vs Spark sql vs column expressions performance optimization

Aug 25, 2022

scala apache-spark apache-spark-sql spark-dataframe

Spark structured streaming - update data frame's schema on the fly

Oct 14, 2019

apache-spark apache-spark-sql schema spark-structured-streaming

Setting up a Spark SQL connection with Kerberos

Sep 05, 2022

java apache-spark apache-spark-sql kerberos

Should I persist a Spark dataframe if I keep adding columns in it?

Oct 29, 2022

scala apache-spark dataframe apache-spark-sql persist

Read a bytes column in spark

Oct 25, 2022

apache-spark encoding pyspark apache-spark-sql

Disable spark catalyst optimizer

Sep 27, 2022

apache-spark optimization apache-spark-sql spark-dataframe query-optimization

Databricks SQL - How to get all the rows (more than 1000) in the first run?

Apr 24, 2022

sql apache-spark-sql databricks

mismatched input 'from' expecting <EOF> SQL

Oct 29, 2022

sql apache-spark-sql

When to use Spark DataFrame/Dataset API and when to use plain RDD?

Oct 25, 2022

apache-spark apache-spark-sql spark-dataframe apache-spark-dataset

Avoid starting HiveThriftServer2 with created context programmatically

Apr 24, 2022

hadoop apache-spark hive apache-spark-sql apache-spark-2.0

NullPointerException after extracting a Teradata table with Scala/Spark

Mar 08, 2019

scala apache-spark dataframe apache-spark-sql teradata

Spark How to get number of Keys changed in two JSONS in Scala?

Mar 30, 2021

json scala apache-spark apache-spark-sql

How do I enable partition pruning in spark

Jun 26, 2019

apache-spark apache-spark-sql spark-dataframe pruning

How to match Dataframe column names to Scala case class attributes?

Mar 13, 2022

scala apache-spark apache-spark-sql parquet

New posts in apache-spark-sql