apache-spark tutorials and guides

what to specify as spark master when running on amazon emr

Mar 12, 2023

apache-spark amazon-emr

NoSuchMethodError when using Spark and IntelliJ

Mar 12, 2023

scala apache-spark intellij-idea jvm

Iterating an RDD and updating a mutable collection returns an empty collection

Mar 13, 2023

scala apache-spark bigdata

PySpark: [Errno 8] nodename nor servname provided, or not known

Mar 12, 2023

python apache-spark pyspark

Print ALL defined variables/method signatures in Spark Shell - Scala REPL

Mar 12, 2023

scala shell apache-spark

How to get the coefficients of the best logistic regression in a spark-ml CrossValidatorModel?

Mar 11, 2023

scala apache-spark logistic-regression cross-validation apache-spark-ml

how to use spark intersection() by key or filter() with two RDD?

Mar 12, 2023

scala apache-spark filter rdd intersection

PySpark: Get top k column for each row in dataframe

Mar 11, 2023

python apache-spark dataframe pyspark apache-spark-sql

How to unnest array with keys to join on afterwards?

Mar 11, 2023

apache-spark hive apache-spark-sql hiveql

What is difference between transformations and rdd functions in spark?

Mar 12, 2023

scala apache-spark rdd

How to find longest sequence of consecutive dates?

Mar 11, 2023

apache-spark apache-spark-sql

Join two Spark mllib pipelines together

Mar 10, 2023

python scala apache-spark apache-spark-mllib apache-spark-ml

Why does word2vec only take one task for mapPartitionsWithIndex at Word2Vec.scala:323

Mar 11, 2023

scala apache-spark apache-spark-mllib word2vec

Spark Scala: moving average for multiple columns

Mar 11, 2023

scala apache-spark

Connect Amazon EMR Spark with MySQL (writing data)

Mar 10, 2023

mysql apache-spark pyspark jdbc amazon-emr

What is the relation between numFeatures in HashingTF in Spark MLlib and actual number of terms in a document?

Mar 11, 2023

apache-spark machine-learning apache-spark-mllib tf-idf

oozie workflow spark launch job on a particular queue

Mar 11, 2023

apache-spark oozie oozie-workflow

Spark Dataset: Filter if value is contained in other dataset

Mar 09, 2023

java apache-spark apache-spark-sql apache-spark-dataset

Partial/Full-match value in one RDD to values in another RDD

Mar 09, 2023

scala apache-spark apache-spark-sql pattern-matching

object ml is not a member of package org.apache.spark

Mar 11, 2023

apache-spark sbt apache-spark-mllib

New posts in apache-spark