Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Yarn container is running out of memory

Apache Spark: How do I convert a Spark DataFrame to a RDD with type RDD[(Type1,Type2, ...)]?

scala apache-spark

Error when creating a StreamingContext

Register UDF to SqlContext from Scala to use in PySpark

pandas str.contains in pyspark dataframe in Pyspark

apache-spark pyspark

How to define Kafka (data source) dependencies for Spark Streaming?

Spark 2.0 DataSets groupByKey and divide operation and type safety

SPARK, DataFrame: difference of Timestamp columns over consecutive rows

spark kafka producer serializable

SPARK: YARN kills containers for exceeding memory limits

apache-spark hadoop-yarn

Sort by dateTime in scala

scala apache-spark rdd

Spark Dataframes- Reducing By Key

How to reference a dataframe when in an UDF on another dataframe?

NullPointerException in org.apache.spark.ml.feature.Tokenizer

How to use Scala UDF in PySpark?

Scala/Spark dataframes: find the column name corresponding to the max

Apache Spark how to append new column from list/array to Spark dataframe

Pyspark: Is there an equivalent method to pandas info()?

Getting last value of group in Spark

How to read streaming data in XML format from Kafka?