The following (unoptimal) code generates all the subsets of size N for certain subset.
This code works but, as I said, is highly unoptimal. Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set
Can anybody suggest how to optimize the code?
import qualified Data.Set as Set
subsetsOfSizeN :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
subsetsOfSizeN n s
| Set.size s < n || n < 0 = error "subsetOfSizeN: wrong parameters"
| otherwise = doSubsetsOfSizeN n s
where doSubsetsOfSizeN n s
| n == 0 = Set.singleton Set.empty
| Set.size s == n = Set.singleton s
| otherwise =
case Set.minView s of
Nothing -> Set.empty
Just (firstS, restS) ->
let partialN n = doSubsetsOfSizeN n restS in
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
This is inspired by Pascal's triangle.
choose :: [b] -> Int -> [[b]]
_ `choose` 0 = [[]]
[] `choose` _ = []
(x:xs) `choose` k = (x:) `fmap` (xs `choose` (k-1)) ++ xs `choose` k
This code works but, as I said, is highly unoptimal.
Doesn't seem so terribly bad to me. The number of subsets of size k
of a set of size n
is n `choose` k
which grows rather fast for k ~ n/2
. So creating all the subsets must scale badly.
Using an intermediate list to avoid the
O(log(n))
ofSet.insert
doesn't seem help due to the large cost of later reconverting the list to a Set.
Hmm, I found using lists to give better performance. Not asymptotically, I think, but a not negligible more-or-less constant factor.
But first, there is an inefficiency in your code that is simple to repair:
Set.map (Set.insert firstS) (partialN (n-1))
Note that Set.map
must rebuild a tree from scratch. But we know that firstS
is always smaller than any element in any of the sets in partialN (n-1)
, so we can use Set.mapMonotonic
that can reuse the spine of the set.
And that principle is also what makes lists attractive, the subsets are generated in lexicographic order, so instead of Set.fromList
we can use the more efficient Set.fromDistinctAscList
. Transcribing the algorithm yields
onlyLists :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
onlyLists n s
| n == 0 = Set.singleton Set.empty
| Set.size s < n || n < 0 = error "onlyLists: out of range n"
| Set.size s == n = Set.singleton s
| otherwise = Set.fromDistinctAscList . map Set.fromDistinctAscList $
go n (Set.size s) (Set.toList s)
where
go 1 _ xs = map return xs
go k l (x:xs)
| k == l = [x:xs]
| otherwise = map (x:) (go (k-1) (l-1) xs) ++ go k (l-1) xs
which in the few benchmarks I've run is between 1.5 and 2× faster than the amended algorithm using Set
s.
And that is in turn, in my criterion benchmarks, nearly twice as fast as dave4420's.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With