Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simplest way to get the top n elements of a Scala Iterable

Is there a simple and efficient solution to determine the top n elements of a Scala Iterable? I mean something like

iter.toList.sortBy(_.myAttr).take(2) 

but without having to sort all elements when only the top 2 are of interest. Ideally I'm looking for something like

iter.top(2, _.myAttr) 

see also: Solution for the top element using an Ordering: In Scala, how to use Ordering[T] with List.min or List.max and keep code readable

Update:

Thank you all for your solutions. Finally, I took the original solution of user unknown and adopted it to use Iterable and the pimp-my-library pattern:

implicit def iterExt[A](iter: Iterable[A]) = new {   def top[B](n: Int, f: A => B)(implicit ord: Ordering[B]): List[A] = {     def updateSofar (sofar: List [A], el: A): List [A] = {       //println (el + " - " + sofar)        if (ord.compare(f(el), f(sofar.head)) > 0)         (el :: sofar.tail).sortBy (f)       else sofar     }      val (sofar, rest) = iter.splitAt(n)     (sofar.toList.sortBy (f) /: rest) (updateSofar (_, _)).reverse   } }  case class A(s: String, i: Int) val li = List (4, 3, 6, 7, 1, 2, 9, 5).map(i => A(i.toString(), i)) println(li.top(3, _.i)) 
like image 649
Stefan Endrullis Avatar asked Apr 15 '11 09:04

Stefan Endrullis


People also ask

How do you iterate through a list in Scala?

After collections like Lists, Sets, and Maps, we will discuss a way to iterate on them (how we can access their elements one by one). Two important methods in Scala Iterator are next () and hasNext. hasNext will tell if there is another element to return, while next () will return that element.

How to find the smallest value in a Scala iterator?

Also, once it reaches the end of the iterator, the iterator is now empty. We can use it.min and it.max to determine the largest and smallest elements in a Scala iterator. This will return the smallest value in the Scala iterator. scala> val it=Iterator (3,2,4,9,7) it: Iterator [Int] = non-empty iterator scala> it.min res1: Int = 2

How to add elements of an iterator to a StringBuilder in Scala?

This appends the elements of the Scala iterator to a String Builder and returns it. scala> val it=Iterator (3,2,4,9,7) it: Iterator [Int] = non-empty iterator scala> it.addString (new StringBuilder ()) res18: StringBuilder = 32497 This lets us include a separator for the above functionality.

How do you get the next element of an iterator?

A call to it.next () will return the next element of the iterator and advance the state of the iterator. You can find out whether there are more elements to return using Iterator's it.hasNext method. The most straightforward way to "step through" all the elements returned by an iterator is to use a while loop.


2 Answers

My solution (bound to Int, but should be easily changed to Ordered (a few minutes please):

def top (n: Int, li: List [Int]) : List[Int] = {    def updateSofar (sofar: List [Int], el: Int) : List [Int] = {     // println (el + " - " + sofar)     if (el < sofar.head)        (el :: sofar.tail).sortWith (_ > _)      else sofar   }    /* better readable:     val sofar = li.take (n).sortWith (_ > _)     val rest = li.drop (n)     (sofar /: rest) (updateSofar (_, _)) */       (li.take (n). sortWith (_ > _) /: li.drop (n)) (updateSofar (_, _))  } 

usage:

val li = List (4, 3, 6, 7, 1, 2, 9, 5)     top (2, li) 
  • For above list, take the first 2 (4, 3) as starting TopTen (TopTwo).
  • Sort them, such that the first element is the bigger one (if any).
  • repeatedly iterate through the rest of the list (li.drop(n)), and compare the current element with the maximum of the list of minimums; replace, if neccessary, and resort again.
  • Improvements:
    • Throw away Int, and use ordered.
    • Throw away (_ > _) and use a user-Ordering to allow BottomTen. (Harder: pick the middle 10 :) )
    • Throw away List, and use Iterable instead

update (abstraction):

def extremeN [T](n: Int, li: List [T])   (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)):      List[T] = {    def updateSofar (sofar: List [T], el: T) : List [T] =     if (comp1 (el, sofar.head))        (el :: sofar.tail).sortWith (comp2 (_, _))      else sofar    (li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _))  }  /*  still bound to Int:   def top (n: Int, li: List [Int]) : List[Int] = {   extremeN (n, li) ((_ < _), (_ > _)) } def bottom (n: Int, li: List [Int]) : List[Int] = {   extremeN (n, li) ((_ > _), (_ < _)) } */  def top [T] (n: Int, li: List [T])    (implicit ord: Ordering[T]): Iterable[T] = {   extremeN (n, li) (ord.lt (_, _), ord.gt (_, _)) } def bottom [T] (n: Int, li: List [T])   (implicit ord: Ordering[T]): Iterable[T] = {   extremeN (n, li) (ord.gt (_, _), ord.lt (_, _)) }  top (3, li) bottom (3, li) val sl = List ("Haus", "Garten", "Boot", "Sumpf", "X", "y", "xkcd", "x11") bottom (2, sl) 

To replace List with Iterable seems to be a bit harder.

As Daniel C. Sobral pointed out in the comments, a high n in topN can lead to much sorting work, so that it could be useful, to do a manual insertion sort instead of repeatedly sorting the whole list of top-n elements:

def extremeN [T](n: Int, li: List [T])   (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)):      List[T] = {    def sortedIns (el: T, list: List[T]): List[T] =      if (list.isEmpty) List (el) else      if (comp2 (el, list.head)) el :: list else        list.head :: sortedIns (el, list.tail)    def updateSofar (sofar: List [T], el: T) : List [T] =     if (comp1 (el, sofar.head))        sortedIns (el, sofar.tail)     else sofar    (li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _))  } 

top/bottom method and usage as above. For small groups of top/bottom Elements, the sorting is rarely called, a few times in the beginning, and then less and less often over time. For example, 70 times with top (10) of 10 000, and 90 times with top (10) of 100 000.

like image 157
user unknown Avatar answered Sep 22 '22 22:09

user unknown


Here's another solution that is simple and has pretty good performance.

def pickTopN[T](k: Int, iterable: Iterable[T])(implicit ord: Ordering[T]): Seq[T] = {   val q = collection.mutable.PriorityQueue[T](iterable.toSeq:_*)   val end = Math.min(k, q.size)   (1 to end).map(_ => q.dequeue()) } 

The Big O is O(n + k log n), where k <= n. So the performance is linear for small k and at worst n log n.

The solution can also be optimized to be O(k) for memory but O(n log k) for performance. The idea is to use a MinHeap to track only the top k items at all times. Here's the solution.

def pickTopN[A, B](n: Int, iterable: Iterable[A], f: A => B)(implicit ord: Ordering[B]): Seq[A] = {   val seq = iterable.toSeq   val q = collection.mutable.PriorityQueue[A](seq.take(n):_*)(ord.on(f).reverse) // initialize with first n    // invariant: keep the top k scanned so far   seq.drop(n).foreach(v => {     q += v     q.dequeue()   })    q.dequeueAll.reverse } 
like image 42
Tom Wang Avatar answered Sep 22 '22 22:09

Tom Wang