apache-spark-sql tutorials

Is there a data architecture for efficient joins in Spark (a la RedShift)?

Oct 31, 2022

How to use correlation in Spark with Dataframes?

Oct 31, 2022

python apache-spark pyspark apache-spark-sql correlation

How to fix 'DataFrame' object has no attribute 'coalesce'?

Oct 31, 2022

python apache-spark dataframe pyspark apache-spark-sql

Spark Streaming Exception: java.util.NoSuchElementException: None.get

Oct 31, 2022

apache-spark hadoop apache-kafka apache-spark-sql spark-streaming

SparkSQL - accesing nested structures Row( field1, field2=Row(..))

Oct 22, 2022

nested apache-spark-sql

Spark-submit Sql Context Create Statement does not work

Oct 21, 2022

scala apache-spark spark-streaming apache-spark-sql

pyspark: "too many values" error after repartitioning

Oct 21, 2022

python apache-spark apache-spark-sql pyspark rdd

Defining DateType conversion for DataFrame schema in Spark

Oct 20, 2022

scala apache-spark-sql

Why would one use DataFrame.select over DataFrame.rdd.map (or vice versa)?

Oct 20, 2022

performance apache-spark dataframe apache-spark-sql rdd

FIRST() or LAST() Aggregate Function in HIVE

Oct 20, 2022

mysql apache-spark hive apache-spark-sql spark-dataframe

Spark-SQL Joining two dataframes/ datasets with same column name

Oct 19, 2022

java apache-spark apache-spark-sql apache-spark-dataset

How to convert RDD of custom Java class objects to a DataFrame with toDF()?

Oct 18, 2022

scala apache-spark apache-spark-sql

PySpark reversing StringIndexer in nested array

Oct 19, 2022

python apache-spark pyspark apache-spark-sql apache-spark-ml

Custom Partitioner in Pyspark 2.1.0

Oct 19, 2022

python pyspark apache-spark-sql

Possible to filter Spark dataframe by ISNUMERIC function?

Oct 19, 2022

scala apache-spark apache-spark-sql

How to compose column name using another column's value for withColumn in Scala Spark

Sep 22, 2022

scala apache-spark apache-spark-sql

Adding a column of rowsums across a list of columns in Spark Dataframe

Oct 03, 2022

scala apache-spark dataframe apache-spark-sql

PySpark: Take average of a column after using filter function

Sep 16, 2022

python apache-spark pyspark apache-spark-sql

Can we load Parquet file into Hive directly?

Jun 26, 2022

hadoop hive apache-spark-sql hiveql parquet

How to avoid shuffles while joining DataFrames on unique keys?

Oct 15, 2022

apache-spark apache-spark-sql

New posts in apache-spark-sql