Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove redundant entries, scala way

Tags:

scala

Edit: Added the fact, that the list is sorted, and realizing 'duplicate' is misleading, replaced that with 'redundant' in the title.

I have a sorted list of entries stating a production value in a given interval. Entries stating the exact same value at a later time adds no information and can safely be left out.

case class Entry(minute:Int, production:Double)
val entries = List(Entry(0, 100.0), Entry(5, 100.0), Entry(10, 100.0), Entry(20, 120.0), Entry(30, 100.0), Entry(180, 0.0))

Experimenting with the scala 2.8 collection functions, so far I have this working implementation:

entries.foldRight(List[Entry]()) {
  (entry, list) => list match {
    case head :: tail if (entry.production == head.production) => entry :: tail
    case head :: tail => entry :: list
    case List() => entry :: List()
  }
}
res0: List[Entry] = List(Entry(0,100.0), Entry(20,120.0), Entry(30,100.0), Entry(180,0.0))

Any comments? Am I missing out on some scala magic?

like image 626
andersbohn Avatar asked Apr 03 '10 13:04

andersbohn


2 Answers

When you are comparing the consecutive entries in a list, start by zip-ping the list with its tail to get a list of pairs of consecutive elements.

Below, I take the first entry from the list, and use collect to simultaneously filter out pairs where the production is unchanged, and for the remaining pairs, map e2. (collect is new in Scala 2.8, and for a while was called partialMap)

scala> entries.head :: ((entries zip entries.tail).collect { 
           case (Entry(_, p1), e2@Entry(_, p2)) if p1 != p2 => e2 
       })
res13: List[Entry] = List(Entry(0,100.0), Entry(20,120.0), Entry(30,100.0), Entry(180,0.0))

UPDATE For simplicity, this assumes that entries is not empty.

like image 95
retronym Avatar answered Sep 21 '22 09:09

retronym


There's a new zipped method with Tuple2 that is more efficient (and lazier) than zip on lists for some operations. You might try this out on your benchmark--I don't know if it's actually faster, but it certainly could be (and it's definitely a lot shorter):

entries.take(1) :::
(entries,entries.drop(1)).zipped.filter(_.production != _.production)._2

Instead of zipping the list pairwise all the way through, it creates a view of the list where the pieces can be manipulated together, and then returns the manipulated lists. Note the use of take and drop to handle the empty case.

It's not super-efficient since it builds two lists when you really only need one, but it is a class of solution that hasn't come up yet.

like image 45
Rex Kerr Avatar answered Sep 21 '22 09:09

Rex Kerr