Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debug memory issue in Haskell

I'm trying to solve the whole Advent of Code series in Haskell.

I'm encountering a memory issue while solving the 2015/06 exercise where there is a bunch of instructions to turn on, off and toggle lights on a grid. The goal is to count the number of lit lights at the end.

Given instructions are parsed and stored in a Instruction type, this is the type definition:

data Instruction = Instruction Op Range deriving Show
data Op = Off | On | Toggle | Nop deriving Show
data Range = Range Start End deriving Show
type Start = Point
type End = Start
data Point = Point Int Int deriving Show

This is the code that calculates the result. I'm trying to abstract over the fact that a light is a Boolean by using a typeclass

gridWidth, gridHeight :: Int
gridWidth = 1000
gridHeight = 1000

initialGrid :: Togglable a => Matrix a
initialGrid = matrix gridWidth gridHeight (const initialState)

instance Monoid Op where
  mempty = Nop

instance Semigroup Op where
  _ <> On = On
  _ <> Off = Off
  x <> Nop = x
  Off <> Toggle = On
  On <> Toggle = Off
  Toggle <> Toggle = Nop
  Nop <> Toggle = Toggle

class Togglable a where
  initialState :: a
  apply :: Op -> a -> a

instance Togglable Bool where
  initialState = False
  apply On = const True
  apply Off = const False
  apply Toggle = not
  apply Nop = id

-- Does the Range of the instruction apply to this matrix coordinate?
(<?) :: Range -> (Int, Int) -> Bool
(<?) (Range start end) (x, y) = let
  (Point x1 y1) = start
  (Point x2 y2) = end
  (mx, my) = (x-1, y-1) -- translate from matrix coords (they start from 1!)
  in and [
    mx >= min x1 x2, mx <= max x1 x2,
    my >= min y1 y2, my <= max y1 y2
  ]

stepGenerator :: Instruction -> Matrix Op
stepGenerator (Instruction op r) = let
  g coord = if r <? coord then op else Nop
  in matrix gridWidth gridHeight g

allStepsMatrix :: [Instruction] -> Matrix Op
allStepsMatrix = mconcat.map stepGenerator

finalGrid :: Togglable a => Matrix a -> Matrix Op -> Matrix a
finalGrid z op = fmap apply op <*> z

countOn :: Matrix Bool -> Integer
countOn = toInteger.foldr (\x -> if x then (+1) else id) 0

partA :: Challenge (String -> Integer)
partA = Challenge $ countOn.finalGrid initialGrid.allStepsMatrix.parse

The solution will be the Integer returned by what's inside partA. parse works and has type parse :: String -> [Instruction]

The code compiles and runs with small matrices (e.g. 10x10), as soon as I turn gridWidth and gridHeight to 1000 I'm faced with a out of memory error, apparently generating from the allStepsMatrix function.

Any hint of what could be wrong here? Full code is on GitHub

like image 614
David Costa Avatar asked Jun 08 '19 14:06

David Costa


People also ask

How do I debug memory leak in Xcode?

Diagnose the Memory LeakChoose “Xcode” in the top left of the screen. Expand “Open Developer Tool,” and select “Instruments” Now choose “Leaks,” and make sure you have chosen your target app and device at the top (“Choose a profiling template for…”):

Does Haskell use a lot of memory?

Haskell computations produce a lot of memory garbage - much more than conventional imperative languages. It's because data are immutable so the only way to store every next operation's result is to create new values. In particular, every iteration of a recursive computation creates a new value.

What is a space leak in Haskell?

A space leak occurs when there exists a point in the computer program where it uses more memory than necessary. Hence, a space leak causes the program to use more space than one would expect. Our primary language of study would be Haskell.

What is a possible memory leak?

In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory which is no longer needed is not released.


1 Answers

I would strongly suggest not using a typeclass. Typeclasses are supposed to have laws, and they should be "rare", in the sense that each type has only a few valid implementations. I would suggest taking initialState and toggle as arguments, but even that is overkill, because the given instructions simply do not make sense with any type that isn't Bool. Just operate on a Matrix Bool directly and you can cut out a good chunk of the code you've written. However, I won't change anything for my answer.

In any case, I think the issue may be laziness. 1000 * 1000 = 1000000, so each Matrix will be several megabytes in size. On a 64-bit machine, a pointer is 8 bytes, so each Matrix is at least 8 MB, plus a few more for the data behind it. You are mconcating 300 of them (that's what I get from the site) together, but, because you are doing it lazily, all 300 matrices are resident simultaneously, so it's at least 2.4 GB, just for the structures themselves. The cost of filling each of those 300 million pointers with thunks also makes itself known—a thunk is at least one pointer (8 bytes, pointing to code in static memory, making another 2.4 GB), plus its payload, which, here, means more pointers, each one bestowing your computer with another 2.4 GB of memory pressure. I suggest deepseq:

instance NFData Op where
  rnf Off = ()
  rnf On = ()
  rnf Toggle = ()
  rnf Nop = ()
  -- rnf x = x `seq` () but I like to be explicit
allStepsMatrix :: [Instruction] -> Matrix Op
allStepsMatrix = foldl' (\x y -> force (x <> y)) mempty . map stepGenerator

Usnig foldl' lets this operate in constant stack space, but foldl or foldr would also work, because a stack depth on the order of 300 is nothing. The force means that all the elements of each Matrix are evaluated. Before, each matrix kept the previous one alive by holding references to it, but now the references are removed when the elements are evaluated, so the GC can throw them out in a timely manner. I have tested this and it terminates in reasonable time with much better space usage.

like image 75
HTNW Avatar answered Oct 08 '22 01:10

HTNW