Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

F# Efficiently removing n items from the end of a Set

I know I can remove the last element from a set:

s.Remove(s.MaximumElement)

But if I want to remove the n maximum elements... do I just execute the above n times, or is there a faster way to do that?

To be clear, this is an obvious solution:

let rec removeLastN (s : Set<'a>, num : int) : Set<'a> = 
    match num with
    | 0 -> s
    | _ -> removeLast(s.Remove(s.MinimumElement), num-1)

But it involves creating a new set n times. Is there a way to do it and only create a new set once?

like image 427
mentics Avatar asked Nov 15 '22 08:11

mentics


1 Answers

But it involves creating a new set n times. Is there a way to do it and only create a new set once?

To the best of my knowledge, no. I'd say what you have a perfectly fine implementation, it runs in O(lg n) -- and its concise too :) Most heap implementations give you O(lg n) for delete min anyway, so what you have is about as good as you can get it.

You might be able to get a little better speed by rolling your balanced tree, and implementing a function to drop a left or right branch for all values greater than a certain value. I don't think an AVL tree or RB tree are appropriate in this context, since you can't really maintain their invariants, but a randomlized tree will give you the results you want.

A treap works awesome for this, because it uses randomization rather than tree invariants to keep itself relatively balanced. Unlike an AVL tree or a RB-tree, you can split a treap on a node without worrying about it being unbalanced. Here's a treap implementation I wrote a few months ago:

http://pastebin.com/j0aV3DJQ

I've added a split function, which will allows you take a tree and return two trees containing all values less than and all values greater than a given value. split runs in O(lg n) using a single pass through the tree, so you can prune entire branches of your tree in one shot -- provided that you know which value to split on.

But if I want to remove the n maximum elements... do I just execute the above n times, or is there a faster way to do that?

Using my Treap class:

open Treap

let nthLargest n t = Seq.nth n (Treap.toSeqBack t)
let removeTopN n t =
    let largest = nthLargest n t
    let smallerValues, wasFound, largerValues = t.Split(largest)
    smallerValues

let e = Treap.empty(fun (x : int) (y : int) -> x.CompareTo(y))
let t = [1 .. 100] |> Seq.fold (fun (acc : Treap<_>) x -> acc.Insert(x)) e
let t' = removeTopN 10 t

removeTopN runs in O(n + lg m) time, where n is the index into the tree sequence and m is the number of items in the tree.

I make no guarantees about the accuracy of my code, use at your own peril ;)

like image 154
Juliet Avatar answered Dec 05 '22 07:12

Juliet