Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Spark Master filling temporary directory

apache-spark

Counting distinct substring occurrences in column for every row in PySpark?

Processing data stored in Redshift

Writing DataFrame as parquet creates empty files

Spark Connection refused for BlockManager process

Spark saveAsTextFile to Azure Blob creates a blob instead of a text file

Compatibility issue with Scala and Spark for compiled jars

Exception in thread "main" java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$

How to spark-submit to ZooKeeper-managed Mesos cluster (gives java.net.UnknownHostException: zk for mesos://zk:// master URL)?

apache-spark mesos

Dataproc CPU usage too low even though all the cores got used

How to use groupBy, collect_list, arrays_zip, & explode together in pyspark to solve certain business problem

apache-spark pyspark

Oozie Spark action failed for kerberos environment

Spark streaming job doesn't delete shuffle files

Spark RDD: How to calculate statistics most efficiently?

Explode column with array of arrays - PySpark