Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

How do I get a SQL row_number equivalent for a Spark RDD?

Understanding spark physical plan

AssertionError: col should be Column

Encode and assemble multiple features in PySpark

Condition in map function

How to calculate sum and count in a single groupBy?

How to create a udf in PySpark which returns an array of strings?

Why does starting StreamingContext fail with “IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute”?

Rolling your own reduceByKey in Spark Dataset

In Apache Spark, why does RDD.union not preserve the partitioner?

PySpark and broadcast join example

Spark union column order

How to find Spark's installation directory?

java ubuntu apache-spark

Join two ordinary RDDs with/without Spark SQL

Multiple condition filter on dataframe

Left Anti join in Spark?

scala apache-spark

SQL query in Spark/scala Size exceeds Integer.MAX_VALUE

Why does Spark application fail with “ClassNotFoundException: Failed to find data source: kafka” as uber-jar with sbt assembly?

Is it possible to alias columns programmatically in spark sql?

How to add any new library like spark-csv in Apache Spark prebuilt version