Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark-sql

Is there a data architecture for efficient joins in Spark (a la RedShift)?

How to use correlation in Spark with Dataframes?

How to fix 'DataFrame' object has no attribute 'coalesce'?

Spark Streaming Exception: java.util.NoSuchElementException: None.get

SparkSQL - accesing nested structures Row( field1, field2=Row(..))

nested apache-spark-sql

Spark-submit Sql Context Create Statement does not work

pyspark: "too many values" error after repartitioning

Defining DateType conversion for DataFrame schema in Spark

scala apache-spark-sql

Why would one use DataFrame.select over DataFrame.rdd.map (or vice versa)?

FIRST() or LAST() Aggregate Function in HIVE

Spark-SQL Joining two dataframes/ datasets with same column name

How to convert RDD of custom Java class objects to a DataFrame with toDF()?

PySpark reversing StringIndexer in nested array

Custom Partitioner in Pyspark 2.1.0

Possible to filter Spark dataframe by ISNUMERIC function?

How to compose column name using another column's value for withColumn in Scala Spark

Adding a column of rowsums across a list of columns in Spark Dataframe

PySpark: Take average of a column after using filter function

Can we load Parquet file into Hive directly?

How to avoid shuffles while joining DataFrames on unique keys?