Have data like :
pid recom-pid
1 1
1 2
1 3
2 1
2 2
2 4
2 5
Need to make it :
pid, recommendations
1 2,3
2 1,4,5
Meaning ignore self from the 2nd column, and make the rest in to a comma separated string. Its tab separated data
Tried variations of, but not sure how to refer to productId in the foldLeft
.groupBy('productId) {
_.foldLeft(('prodReco) -> 'prodsR)("") {
(s: String, s2: String) =>
{
println(" s " + s + ", s2 :" + s2 + "; pid :" + productId + ".")
if (productId.equals(s2)) {
s
} else {
s + "," + s2;
}
}
}
}
Using scala 2.10 with scalding 0.10.0 and cascading 2.5.3. Need a scalding answer. I know how to manipulate the data in scala. I'm just wondering how to get hold of the columns during group by in scalding and use them to conditionally do a fold left or other means to get the filtered output.
For a full working sample see https://github.com/tgkprog/scaldingEx2/tree/master/Q1
Instead of groupBy
and then foldLeft
, use just foldLeft
.
Here is a solution using scala collections but it should works using scalading as well:
val source = List((1,1), (1,2), (1,3), (2,1), (2,2), (2,4), (2,5))
source.foldLeft(Map[Int, List[Int]]())((m,e) =>
if (e._1 == e._2) m else m + (e._1 -> (e._2 :: m.getOrElse(e._1, List()))))
Just a groupBy
and a map
should be enough to accomplish what you want.
// Input data formatted as a list of tuples.
val tt = Seq((1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 4), (2, 5))
tt
.groupBy(_._1) // Map(2 -> List((2, 1), ...), 1 -> List((1, 1), ...))
.toSeq // for easier mapping
.map({
case (pid, recomPids) => {
val pids = recomPids.collect({
case recomPid if recomPid._2 != pid => recomPid._2
})
(pid, pids)
}
}) // List((2, List(1, 4, 5)), (1, List(2, 3)))
I simplified the input/output form to just focus on getting the collections into the right form.
Assume pid| recom-pid > temp.txt
and so
import scala.io.Source
val xs = Source.fromFile("temp.txt").getLines.toArray.map(_.split("\\|"))
We convert xs
into tuples, like this
val pairs = for (Array(pid, recom) <- xs) yield (pid,recom)
Array((1,1), (1,2), (1,3), (2,1), (2,2), (2,4), (2,5))
and group by the first element,
val g = pairs.groupBy(_._1)
Map(2 -> Array((2,1), (2,2), (2,4), (2,5)), 1 -> Array((1,1), (1,2), (1,3)))
Then we remove mapped identity tuples, which ensures always an entry in the map, where an empty array denotes there was only the identity tuple (viz. unique occurrence of 3|3
would lead to 3 -> Array()
),
val res = g.mapValues(_.filter { case (a,b) => a != b } )
Map(2 -> Array((2,1), (2,4), (2,5)), 1 -> Array((1,2), (1,3)))
Asssuming your string input is correct that returns you a Map[String, Array[String]]
s.split('\n')
.map(_.split("\\|"))
.groupBy(_(0))
.mapValues(_.flatten)
.transform {case (k, v) ⇒ v.filter(_ != k)}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With