apache-spark-sql tutorials

How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

Nov 21, 2022

apache-spark apache-spark-sql

Apache Spark: In SparkSql, are sql's vulnerable to Sql Injection [duplicate]

Apr 05, 2022

hadoop apache-spark hive apache-spark-sql bigdata

rank() function usage in Spark SQL

Sep 04, 2018

java apache-spark apache-spark-sql window-functions rank

How to convert the group by function to data frame

Nov 21, 2022

scala apache-spark apache-spark-sql

How can you update values in a dataset?

Aug 25, 2022

apache-spark apache-spark-sql

How to add sparse vectors after group by, using Spark SQL?

Sep 16, 2022

python apache-spark machine-learning apache-spark-sql pyspark-sql

How to compute statistics on a streaming dataframe for different type of columns in a single query?

Sep 24, 2022

scala apache-spark apache-spark-sql spark-structured-streaming

Pyspark: java.lang.OutOfMemoryError: GC overhead limit exceeded

Nov 08, 2022

apache-spark pyspark apache-spark-sql

How to write dataframe with duplicate column name into a csv file in pyspark

Sep 05, 2022

apache-spark pyspark apache-spark-sql apache-spark-2.0

Spark - Non-time-based windows are not supported on streaming DataFrames/Datasets;

Sep 14, 2022

java apache-spark apache-spark-sql spark-streaming

Why does Spark groupBy.agg(min/max) of BigDecimal always return 0?

Nov 11, 2022

apache-spark apache-spark-sql bigdecimal

How do explicit table partitions in Databricks affect write performance?

Jun 26, 2022

amazon-s3 hive apache-spark-sql databricks delta-lake

Using partitions (with partitionBy) when writing a delta lake has no effect

Apr 26, 2022

apache-spark apache-spark-sql partitioning mapr delta-lake

Why joining structure-identic dataframes gives different results?

Sep 30, 2022

apache-spark join pyspark apache-spark-sql

how to collect spark sql output to a file?

Sep 12, 2022

scala apache-spark apache-spark-sql

Ever increasing physical memory for a Spark application in YARN

Mar 12, 2022

java hadoop memory apache-spark apache-spark-sql

How to persist sorted parquet tables for future sort merge joins?

Mar 30, 2022

apache-spark apache-spark-sql parquet

Error creating transactional connection factory during running Spark on Hive project in IDEA

Jul 26, 2021

apache-spark hive apache-spark-sql metastore

SPARK DataFrame: Remove MAX value in a group

Mar 12, 2022

apache-spark dataframe apache-spark-sql

Spark Dataset when to use Except vs Left Anti Join

Nov 09, 2022

apache-spark apache-spark-sql anti-join

New posts in apache-spark-sql