Is there a simple and efficient solution to determine the top n elements of a Scala Iterable? I mean something like
iter.toList.sortBy(_.myAttr).take(2)
but without having to sort all elements when only the top 2 are of interest. Ideally I'm looking for something like
iter.top(2, _.myAttr)
see also: Solution for the top element using an Ordering: In Scala, how to use Ordering[T] with List.min or List.max and keep code readable
Thank you all for your solutions. Finally, I took the original solution of user unknown and adopted it to use Iterable
and the pimp-my-library pattern:
implicit def iterExt[A](iter: Iterable[A]) = new { def top[B](n: Int, f: A => B)(implicit ord: Ordering[B]): List[A] = { def updateSofar (sofar: List [A], el: A): List [A] = { //println (el + " - " + sofar) if (ord.compare(f(el), f(sofar.head)) > 0) (el :: sofar.tail).sortBy (f) else sofar } val (sofar, rest) = iter.splitAt(n) (sofar.toList.sortBy (f) /: rest) (updateSofar (_, _)).reverse } } case class A(s: String, i: Int) val li = List (4, 3, 6, 7, 1, 2, 9, 5).map(i => A(i.toString(), i)) println(li.top(3, _.i))
After collections like Lists, Sets, and Maps, we will discuss a way to iterate on them (how we can access their elements one by one). Two important methods in Scala Iterator are next () and hasNext. hasNext will tell if there is another element to return, while next () will return that element.
Also, once it reaches the end of the iterator, the iterator is now empty. We can use it.min and it.max to determine the largest and smallest elements in a Scala iterator. This will return the smallest value in the Scala iterator. scala> val it=Iterator (3,2,4,9,7) it: Iterator [Int] = non-empty iterator scala> it.min res1: Int = 2
This appends the elements of the Scala iterator to a String Builder and returns it. scala> val it=Iterator (3,2,4,9,7) it: Iterator [Int] = non-empty iterator scala> it.addString (new StringBuilder ()) res18: StringBuilder = 32497 This lets us include a separator for the above functionality.
A call to it.next () will return the next element of the iterator and advance the state of the iterator. You can find out whether there are more elements to return using Iterator's it.hasNext method. The most straightforward way to "step through" all the elements returned by an iterator is to use a while loop.
My solution (bound to Int, but should be easily changed to Ordered (a few minutes please):
def top (n: Int, li: List [Int]) : List[Int] = { def updateSofar (sofar: List [Int], el: Int) : List [Int] = { // println (el + " - " + sofar) if (el < sofar.head) (el :: sofar.tail).sortWith (_ > _) else sofar } /* better readable: val sofar = li.take (n).sortWith (_ > _) val rest = li.drop (n) (sofar /: rest) (updateSofar (_, _)) */ (li.take (n). sortWith (_ > _) /: li.drop (n)) (updateSofar (_, _)) }
usage:
val li = List (4, 3, 6, 7, 1, 2, 9, 5) top (2, li)
def extremeN [T](n: Int, li: List [T]) (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)): List[T] = { def updateSofar (sofar: List [T], el: T) : List [T] = if (comp1 (el, sofar.head)) (el :: sofar.tail).sortWith (comp2 (_, _)) else sofar (li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _)) } /* still bound to Int: def top (n: Int, li: List [Int]) : List[Int] = { extremeN (n, li) ((_ < _), (_ > _)) } def bottom (n: Int, li: List [Int]) : List[Int] = { extremeN (n, li) ((_ > _), (_ < _)) } */ def top [T] (n: Int, li: List [T]) (implicit ord: Ordering[T]): Iterable[T] = { extremeN (n, li) (ord.lt (_, _), ord.gt (_, _)) } def bottom [T] (n: Int, li: List [T]) (implicit ord: Ordering[T]): Iterable[T] = { extremeN (n, li) (ord.gt (_, _), ord.lt (_, _)) } top (3, li) bottom (3, li) val sl = List ("Haus", "Garten", "Boot", "Sumpf", "X", "y", "xkcd", "x11") bottom (2, sl)
To replace List with Iterable seems to be a bit harder.
As Daniel C. Sobral pointed out in the comments, a high n
in topN can lead to much sorting work, so that it could be useful, to do a manual insertion sort instead of repeatedly sorting the whole list of top-n elements:
def extremeN [T](n: Int, li: List [T]) (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)): List[T] = { def sortedIns (el: T, list: List[T]): List[T] = if (list.isEmpty) List (el) else if (comp2 (el, list.head)) el :: list else list.head :: sortedIns (el, list.tail) def updateSofar (sofar: List [T], el: T) : List [T] = if (comp1 (el, sofar.head)) sortedIns (el, sofar.tail) else sofar (li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _)) }
top/bottom method and usage as above. For small groups of top/bottom Elements, the sorting is rarely called, a few times in the beginning, and then less and less often over time. For example, 70 times with top (10) of 10 000, and 90 times with top (10) of 100 000.
Here's another solution that is simple and has pretty good performance.
def pickTopN[T](k: Int, iterable: Iterable[T])(implicit ord: Ordering[T]): Seq[T] = { val q = collection.mutable.PriorityQueue[T](iterable.toSeq:_*) val end = Math.min(k, q.size) (1 to end).map(_ => q.dequeue()) }
The Big O is O(n + k log n)
, where k <= n
. So the performance is linear for small k
and at worst n log n
.
The solution can also be optimized to be O(k)
for memory but O(n log k)
for performance. The idea is to use a MinHeap to track only the top k
items at all times. Here's the solution.
def pickTopN[A, B](n: Int, iterable: Iterable[A], f: A => B)(implicit ord: Ordering[B]): Seq[A] = { val seq = iterable.toSeq val q = collection.mutable.PriorityQueue[A](seq.take(n):_*)(ord.on(f).reverse) // initialize with first n // invariant: keep the top k scanned so far seq.drop(n).foreach(v => { q += v q.dequeue() }) q.dequeueAll.reverse }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With