I implemented a small function bruteforce
, using lazy evaluation to find the first valid solution for a problem:
import Data.Maybe
bruteforce :: (a -> Bool) -> [a] -> Maybe a
bruteforce f xs
| null result = Nothing
| otherwise = Just $ head result
where
result = mapMaybe bruteforce' xs
-- test one instance
bruteforce' x
| f x = Just x
| otherwise = Nothing
generatorString :: Int -> [String]
generatorString 0 = [""]
generatorString deep = concatMap (\x -> map (\ys -> (x:ys)) nextgen) ['a'..'z']
where nextgen = generatorString (deep - 1)
main :: IO ()
main = do
putStrLn $ fromJust $ bruteforce ((==) "zabcde") (generatorString 6)
also avaiable as a gist. yes, the test function is stupid, but it's enough to show what's the problem. It works as expected, however memory consumption is quite high:
$ ghc --make -O2 brute.hs -o brute -rtsopts
$ ./brute +RTS -s
zabcde
24,752,866,120 bytes allocated in the heap
15,748,430,224 bytes copied during GC
581,741,352 bytes maximum residency (22 sample(s))
18,464,056 bytes maximum slop
1719 MB total memory in use (0 MB lost due to fragmentation)
[...]
so I tried to force the RTS using less memory (i.e. call the GC more often). 200 MB should be enough?
$ ./brute +RTS -s -M200M
Heap exhausted;
Current maximum heap size is 209715200 bytes (200 MB);
use `+RTS -M<size>' to increase it.
well, nope (I wouldn't like this approach anyway...)
Any pointers how one could rewrite bruteforce
to be more memory friendly?
If the memory consumption is the primary concern, stop sharing nextgen
. That's a huge list.
So replace
generatorString deep = concatMap (\x -> map (\ys -> (x:ys)) nextgen) ['a'..'z']
where nextgen = generatorString (deep - 1)
with
generatorString deep = concatMap (\x -> map (\ys -> (x:ys)) $ generatorString (deep - 1)) ['a'..'z']
and tell the compiler that you're serious about not sharing it:
$ ghc -O2 -fno-full-laziness -rtsopts bruteforce
That runs in
$ ./bruteforce +RTS -s
zabcde
189,448,835,904 bytes allocated in the heap
18,301,350,520 bytes copied during GC
29,504 bytes maximum residency (16901 sample(s))
37,248 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
small resident memory. Of course the recomputation means the total allocation is much higher, and it also takes much longer to compute the result.
A better algorithm could reduce space and time consumption.
You could skip all those Just
s and Nothing
s I guess...
import Data.Maybe (listToMaybe)
bruteforce :: (a -> Bool) -> [a] -> Maybe a
bruteforce f = listToMaybe . filter f
This is probably as memory-friendly as bruteforce
can get; any other problems are because of the function f
that is being brute-forced.
The generatorString
function can be rewritten like this:
import Control.Monad (replicateM)
generatorString :: Int -> [String]
generatorString = flip replicateM ['a'..'z']
If you want me to explain how this works, let me know in a comment. What the improvement boils down to is that it uses prefix sharing instead of suffix sharing. I.e. it generates strings like this:
"aa"
"ab"
...
"az"
"ba"
... instead of:
"aa"
"ba"
...
"za"
"ab"
"bb"
This means that the prefix (in this simple example, it's only ('a':)
and then ('b':)
) is shared, instead of storing the list of suffixes and reusing it (the list ["a", "b", "c", ..., "z"]
in this example). You can imagine that as n
increases, the suffix list will grow with 26^n
, while the prefix list will grow with n
. This comes at the cost of constructing the whole current string at every iteration, of course, but the memory usage is much lower.
I think that you generator is kind of too strict. An optimal generator is supposed to yield as much of the result list as possible with as little information about the result of the recursive application.
Let us consider the following line.
concatMap (\x -> map (\ys -> (x:ys)) nextgen) ['a'..'z']
Now, let us check what happens if we only know that nextgen
is not the empty list. In order to illustrate this we replace the variable nextgen
by the expression undefined:undefined
. The following equations illustrate the evaluation of the considered expression.
concatMap (\x -> map (\ys -> (x:ys)) (undefined:undefined)) ['a'..'z']
= concat (map (\x -> map (\ys -> (x:ys)) (undefined:undefined)) ['a'..'z'])
= concat (map (\ys -> ('a':ys)) (undefined:undefined) : map (\x -> map (\ys -> (x:ys)) (undefined:undefined)) ['b'..'z'])
= concat (('a':undefined) : undefined) : map (\x -> map (\ys -> (x:ys)) (undefined:undefined)) ['b'..'z'])
= ('a':undefined) : undefined
Your specific application can already discard many results by comparing the first character of the generated string and the search string. Therefore, we are looking for a generator that yields the heads of the generated strings as soon as possible.
Let us change the roles of the static information (list of characters) and the recursive application. We get the following expression.
concatMap (\ys -> map (:ys) ['a'..'z']) nextgen
Now, let us do the same calculation as before.
concatMap (\ys -> map (:ys) ['a'..'z']) (undefined:undefined)
= concat (map (\ys -> map (:ys) ['a'..'z']) (undefined:undefined))
= concat (map (:undefined) ['a'...'z'] : map (\ys -> map (:ys) ['a'..'z']) undefined)
= map (:undefined) ['a'...'z'] ++ concat (map (\ys -> map (:ys) ['a'..'z']) undefined
The application map (:undefined) ['a'...'z']
already yields a list where all heads are defined. Therefore, the test can already fail for most of these strings by only evaluating the recursive application to head normal form.
With this altered implementation we get the following results.
$ ./brute +RTS -s
zabcde
4,165,170,696 bytes allocated in the heap
5,569,320 bytes copied during GC
29,848 bytes maximum residency (5 sample(s))
26,560 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Nevertheless, as this change is quite specific to the application at hand it might not be applicable to your actual application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With