Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast obtention of all the subsets of size N in Haskell

The following (unoptimal) code generates all the subsets of size N for certain subset.

This code works but, as I said, is highly unoptimal. Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set

Can anybody suggest how to optimize the code?

import qualified Data.Set as Set


subsetsOfSizeN :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
subsetsOfSizeN n s
  | Set.size s < n || n < 0 = error "subsetOfSizeN: wrong parameters"
  | otherwise = doSubsetsOfSizeN n s
 where doSubsetsOfSizeN n s
        | n == 0 = Set.singleton Set.empty
        | Set.size s == n = Set.singleton s
        | otherwise =
           case Set.minView s of
             Nothing -> Set.empty
             Just (firstS, restS) ->
               let partialN n = doSubsetsOfSizeN n restS in
               Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
like image 993
fons Avatar asked Nov 30 '22 01:11

fons


2 Answers

This is inspired by Pascal's triangle.

choose :: [b] -> Int -> [[b]]
_      `choose` 0       = [[]]
[]     `choose` _       =  []
(x:xs) `choose` k       =  (x:) `fmap` (xs `choose` (k-1)) ++ xs `choose` k
like image 176
rickythesk8r Avatar answered Dec 05 '22 04:12

rickythesk8r


This code works but, as I said, is highly unoptimal.

Doesn't seem so terribly bad to me. The number of subsets of size k of a set of size n is n `choose` k which grows rather fast for k ~ n/2. So creating all the subsets must scale badly.

Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set.

Hmm, I found using lists to give better performance. Not asymptotically, I think, but a not negligible more-or-less constant factor.

But first, there is an inefficiency in your code that is simple to repair:

Set.map (Set.insert firstS) (partialN (n-1))

Note that Set.map must rebuild a tree from scratch. But we know that firstS is always smaller than any element in any of the sets in partialN (n-1), so we can use Set.mapMonotonic that can reuse the spine of the set.

And that principle is also what makes lists attractive, the subsets are generated in lexicographic order, so instead of Set.fromList we can use the more efficient Set.fromDistinctAscList. Transcribing the algorithm yields

onlyLists :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
onlyLists n s
    | n == 0                    = Set.singleton Set.empty
    | Set.size s < n || n < 0   = error "onlyLists: out of range n"
    | Set.size s == n           = Set.singleton s
    | otherwise                 = Set.fromDistinctAscList . map Set.fromDistinctAscList $
                                                         go n (Set.size s) (Set.toList s)
      where
        go 1 _ xs = map return xs
        go k l (x:xs)
            | k == l = [x:xs]
            | otherwise = map (x:) (go (k-1) (l-1) xs) ++ go k (l-1) xs

which in the few benchmarks I've run is between 1.5 and 2× faster than the amended algorithm using Sets.

And that is in turn, in my criterion benchmarks, nearly twice as fast as dave4420's.

like image 20
Daniel Fischer Avatar answered Dec 05 '22 03:12

Daniel Fischer