apache-spark-sql tutorials

How to read bz2 files into dataframes using pyspark?

Nov 18, 2022

Spark HiveContext does not retrieve newly inserted records from Hive Table

May 15, 2021

apache-spark-sql

In Apache Spark SQL, How to close metastore connection from HiveContext

Oct 17, 2022

apache-spark thrift apache-spark-sql apache-spark-1.4

Spark partitionBy much slower than without it

Sep 15, 2022

scala apache-spark apache-spark-sql parquet

Spark Dataframe Maximum Column Count

Apr 02, 2022

apache-spark pyspark apache-spark-sql

Spark SQL: INSERT INTO statement syntax

Sep 19, 2022

apache-spark apache-spark-sql

Spark concurrent writes on same HDFS location

Apr 25, 2022

apache-spark hadoop apache-spark-sql hdfs apache-nifi

AWS EMR: Pyspark: Rdd: mappartitions: Could not find valid SPARK_HOME while searching: Spark closures

May 22, 2022

apache-spark pyspark apache-spark-sql python-requests amazon-emr

Pyspark : Cumulative Sum with reset condition

Jan 09, 2022

apache-spark pyspark apache-spark-sql cumulative-sum

Structured Streaming and Splitting nested data into multiple datasets

Oct 28, 2022

apache-spark apache-kafka apache-spark-sql spark-structured-streaming

Spark SQL - Encoders for Tuple Containing a List or Array as an Element

Apr 10, 2020

java apache-spark apache-spark-sql spark-dataframe

PySpark No suitable driver found for jdbc:mysql://dbhost

Mar 12, 2018

apache-spark apache-spark-sql pyspark

Saving Spark DataFrames with nested User Data Types

Oct 27, 2019

apache-spark apache-spark-sql

Performance of loading parquet files into case classes in Spark

Oct 25, 2022

scala apache-spark apache-spark-sql parquet

Why does SparkSQL require two literal escape backslashes in the SQL query?

Nov 10, 2022

apache-spark apache-spark-sql apache-spark-2.0

Outer join two Datasets (not DataFrames) in Spark Structured Streaming

Aug 28, 2022

scala apache-spark apache-spark-sql spark-structured-streaming

Access AWS Glue from local Spark

May 15, 2022

amazon-web-services apache-spark apache-spark-sql aws-glue

New posts in apache-spark-sql