New posts in apache-spark

Spark Standalone Mode: Change replication factor of HDFS output

scala hdfs apache-spark

org.apache.spark.sql.Row to Int

scala apache-spark

sbt won't assemble Spark

Reading latest in spark kafka streaming

How to launch Spark's ApplicationMaster on a particular node in YARN cluster?

apache-spark hadoop-yarn

Update collection in MongoDb via Apache Spark using Mongo-Hadoop connector

java mongodb apache-spark rdd

How to run spark-notebook on docker on MacOS X?

Running spark inside intellij idea HttpServletResponse - ClassNotFoundException

How to print <String, Array[]> as a flat pair?

java apache-spark

value join is not a member of org.apache.spark.rdd.RDD

scala apache-spark

Running a Spark application on YARN, without spark-submit

apache-spark hadoop-yarn

Specify options for the jvm launched by pyspark

Apache Spark Task not Serializable

Performing sum on a rdd int array


Can't zip RDDs with unequal numbers of partitions

apache-spark rdd

"java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext" When execute spark streaming

SparkDeploySchedulerBackend Error: Application has been killed. All masters are unresponsive


Apache Spark and node.js

SparkSQL PostgresQL Dataframe partitions

How to use pyspark mllib RegressionMetrics with real predictions