Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will this Haskell code take too much memory?

As in this code:

import Data.Char
groupsOf _ [] = []
groupsOf n xs = 
    take n xs : groupsOf n ( tail xs )

problem_8 x = maximum . map product . groupsOf 5 $ x
main = do t <- readFile "p8.log" 
      let digits = map digitToInt $concat $ lines t
      print $ problem_8 digits

I realize that in Haskell many programs construct and seem to store some intermediate results, such as the groupsOf list in the above code. The above code is the reference answer for Euler project problem 8, taken from Haskell website. The original question ask for the largest product of 5 consecutive digits in a very long chain of digits such as 45343231635723421312443535767983456. So the code will calculate all the products and store it as a list. In other languages I think they will only keep a temporary largest value and discard anything smaller.

So does groupsOf really stores all the intermediate results? what if the problem scales up? Will it allocate too much memory?

like image 753
McBear Holden Avatar asked Nov 25 '11 23:11

McBear Holden


3 Answers

No, not because of groupsOf. That one will only need to keep one group in memory at a time. However, maximum might build up a large thunk since it's too lazy, so make sure to either compile with -O or -O2 so that strictness analysis is performed, or use foldl1' max instead1.

1foldl1' is found in Data.List

like image 111
hammar Avatar answered Nov 01 '22 09:11

hammar


I realize that in Haskell many programs construct and seem to store some intermediate results, such as the groupsOf list in the above code.

"seem to" is a good thing to say here, because in all honesty this is where Haskell can really shine. It can, in fact, take up much less memory, without complicated micromanagement on your part, thanks to laziness.

One nifty thing about lazy IO (e.g. readFile) is that it will read only as much of the file as necessary, and the file can be garbage-collected as you consume its contents. (Well, roughly so. It depends on how you set the buffering.)

Here's a rough sketch of how your program will be executed:

t <- readFile "p8.log"

Create a thunk for the t :: String. Probably check that the file exists, and open it in read mode.

 let digits = map digitToInt $concat $ lines t

Create a thunk for digits :: [Int]

 print $ problem_8 digits

Once we attempt to execute this, all of the work will actually get done. We need to fully evaluate problem_8 digits :: Int in order to print it. So we create a thunk for problem_8 digits and force it.

maximum . map product . groupsOf 5 $ digits

At this point, maximum needs the first two elements of map product . groupsOf 5 $ digits in order to see which of the two is greater. Therefore map product needs to see the first two elements of groupsOf 5 digits to pass along.

Now, groupsOf 5 will need the first 5 elements of digits in order to produce its first element. At this point, the first line of the file will probably be read, and go through the defined transformations. groupsOf can construct its first element, and probably its second element (assuming there are more than 6 characters on that line). groupsOf 5 digits passes its first two elements up the chain, we map product onto those two things, and then maximum checks which of the two is larger. We keep the result, the larger of the two.

At this point (in fact somewhat earlier than this point) we can entirely discard the intermediate results. The first two elements of groupOf 5 digits are now entirely unnecessary; we will never need to inspect them again, and the garbage collector can collect them anytime. The first two characters of that file will not be used anymore either, and can be discarded.


Now, this is extremely handwavey and possibly slightly inaccurate. But do you get the idea? Haskell is lazy (well technically Haskell is "non-strict", which is generally implemented via laziness), which makes its execution style much different than any strict language. This allows us to use what appears to be tons of intermediate representations and data structures, without actually taking up tons of memory. Good compilers, such as GHC, can optimize memory usage like you wouldn't believe.

like image 37
Dan Burton Avatar answered Nov 01 '22 11:11

Dan Burton


I'm not sure what you mean by "intermediate results" here, exactly... But your code looks fine to me. Specifically, your implementation of groupsOf is tail recursive modulo cons, so I don't think you'll have to worry about stack overflow, if that's what you're concerned about.

I might be misunderstanding your question, though. Could you clarify?

like image 3
mergeconflict Avatar answered Nov 01 '22 10:11

mergeconflict