Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark-sql

Spark dataframe join with range slow

Spark DataFrame - Read pipe delimited file using SQL?

Spark Sql UDF throwing NullPointer when adding a filter on a columns that uses that UDF

Spark SQL alternatives to groupby/pivot/agg/collect_list using foldLeft & withColumn so as to improve performance

Last Access Time Update in Hive metastore

Reshape Spark DataFrame from Long to Wide On Large Data Sets

You need to build Spark before running this program error when running bin/pyspark

How to connect spark-shell to Mesos?

Iterating/looping over Spark parquet files in a script results in memory error/build-up (using Spark SQL queries)

Scala Spark - creating nested json output from simple dataframe

How to query on data frame where 1 field of StringType has json value in Spark SQL

Spark ML Pipeline Causes java.lang.Exception: failed to compile ... Code ... grows beyond 64 KB

Transforming one column into multiple ones in a Spark Dataframe

Why join in spark in local mode is so slow?

Aggregate sparse vector in PySpark

JSON Struct to Map[String,String] using sqlContext

pyspark corr for each group in DF (more than 5K columns)

Is there a data architecture for efficient joins in Spark (a la RedShift)?

What are the mandatory options for loading Excel file?

Why does format("kafka") fail with "Failed to find data source: kafka." (even with uber-jar)?