spark scala get uncommon map elements

Question

I am trying to split my data set into train and test data sets. I first read the file into memory as shown here:

val ratings = sc.textFile(movieLensdataHome+"/ratings.csv").map { line=>
  val fields = line.split(",")
  Rating(fields(0).toInt,fields(1).toInt,fields(2).toDouble)
}

Then I select 80% of those for my training set:

val train = ratings.sample(false,.8,1)

Is there an easy way to get the test set in a distributed way, I am trying this but fails:

val test = ratings.filter(!_.equals(train.map(_)))

Shyamendra Solanki · Accepted Answer

val test = ratings.subtract(train)

Oussama · Answer

Take a look here. http://markmail.org/message/qi6srcyka6lcxe7o

Here is the code

  def split[T : ClassManifest](data: RDD[T], p: Double, seed: Long =
System.currentTimeMillis): (RDD[T], RDD[T]) = {
    val rand = new java.util.Random(seed)
    val partitionSeeds = data.partitions.map(partition => rand.nextLong)
    val temp = data.mapPartitionsWithIndex((index, iter) => {
      val partitionRand = new java.util.Random(partitionSeeds(index))
      iter.map(x => (x, partitionRand.nextDouble))

    })
    (temp.filter(_._2 <= p).map(_._1), temp.filter(_._2 > p).map(_._1))
  }

spark scala get uncommon map elements

Tags:

machine-learning

scala

apache-spark

venuktan

2 Answers

Shyamendra Solanki

Oussama

Recent Activity

Donate For Us

spark scala get uncommon map elements

Tags:

machine-learning

scala

apache-spark

venuktan

2 Answers

Shyamendra Solanki

Oussama

Related questions

Recent Activity

Donate For Us