Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this not run in constant memory?

I am trying to write a very large amount of data to a file in constant memory.

import qualified Data.ByteString.Lazy as B

{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
    rng <- newPureMT
    let (grids,shuffleds) = createGrids rng aa
    createDirectoryIfMissing True "data/grids/"
    B.writeFile (gridFileName num aa)
                (encode (take num grids))
    B.writeFile (shuffledFileName num aa)
                (encode (take num shuffleds))

However this consumes memory proportional to the size of num. I know createGrids is a sufficiently lazy function because I have tested it by appending error "not lazy enough" (as suggested by the Haskell wiki here) to the end of the lists it returns and no errors are raised. take is a lazy function that is defined in Data.List. encode is also a lazy function defined in Data.Binary. B.writeFile is defined in Data.ByteString.Lazy.

Here is the complete code so you can execute it:

import Control.Arrow (first)
import Data.Binary
import GHC.Float (double2Float)
import System.Random (next)
import System.Random.Mersenne.Pure64 (PureMT, newPureMT, randomDouble)
import System.Random.Shuffle (shuffle')
import qualified Data.ByteString.Lazy as B

main :: IO ()
main = writeGrids 1000 64

{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
    rng <- newPureMT
    let (grids,shuffleds) = createGrids rng aa
    B.writeFile "grids.bin" (encode (take num grids))
    B.writeFile "shuffleds.bin" (encode (take num shuffleds))

{- a random number generator, dimension of grids to make
   returns a pair of lists, the first is a list of grids of dimensions
   aa x aa, the second is a list of the shuffled grids corresponding to the first list -}
createGrids :: PureMT -> Int -> ([[(Float,Float)]],[[(Float,Float)]])
createGrids rng aa = (grids,shuffleds) where
       rs = randomFloats rng
       grids = map (getGridR aa) (chunksOf (2 * aa * aa) rs) 
       shuffleds = shuffler (aa * aa) rng grids

{- length of each grid, a random number generator, a list of grids
   returns a the list with each grid shuffled -}
shuffler :: Int -> PureMT -> [[(Float,Float)]] -> [[(Float,Float)]]
shuffler n rng (xs:xss) = shuffle' xs n rng : shuffler n (snd (next rng))         xss
shuffler _ _ [] = []

{- divides list into chunks of size n -}
chunksOf :: Int -> [a] -> [[a]]
chunksOf n = go 
     where go xs = case splitAt n xs of
              (ys,zs) | null ys -> []
                      | otherwise -> ys : go zs

{- dimension of grid, list of random floats [0,1]
   returns a list of (x,y) points of length n^2 such that all
   points are in the range [0,1] and the points are a randomly 
   perturbed regular grid -}
getGridR :: Int -> [Float] -> [(Float,Float)]
getGridR n rs = pts where
   nn = n * n
   (irs,jrs) = splitAt nn rs
   n' = fromIntegral n
   grid = [ (p,q) | p <- [0..n'-1], q <- [0..n'-1] ]
   pts = zipWith (\(p,q) (ir,jr) -> ((p+ir)/n',(q+jr)/n')) grid (zip irs jrs)

{- an infinite list of random floats in range [0,1] -}
randomFloats :: PureMT -> [Float]
randomFloats rng = let (d,rng') = first double2Float (randomDouble rng)
                   in d : randomFloats rng'

The required packages are: , bytestring , binary , random , mersenne-random-pure64 , random-shuffle

like image 358
Justin Raymond Avatar asked Jul 21 '15 13:07

Justin Raymond


People also ask

Why is constant memory faster than device memory?

As a result, a read from constant memory costs one memory read from device memory only on a cache miss; otherwise, it just costs one read from the constant cache. For all threads of a half warp, reading from the constant cache is as fast as reading from a register as long as all threads read the same address.

When should we not use constant memory in C++?

In essence, you should not use constant memory when every thread in a block don’t access same address space. For example; you have an array of 3,000 elements and you breaks this element to lunch sufficient number of threads in a block.

How to fix “your device does not have enough memory to run”?

To fix “Your device does not have enough memory to run this experience” on Roblox, you need to force close all apps and delete some apps or photos. Alternatively, you can try logging out and logging in or uninstalling and reinstalling Roblox. The error message indicates that your device does not have enough memory.

Why read from 64kB of constant memory?

There are two reasons why reading from the 64KB of constant memory can save bandwidth over standard reads of global memory: A single read from constant memory can be broadcast to other “nearby” threads, effectively saving up to 15 reads.


2 Answers

For what it's worth, here is a full solution combining the ideas of everyone here. Memory consumption is constant at ~6MB (compiled with -O2).

import Control.Arrow (first)
import Control.Monad.State (state, evalState)
import Data.Binary
import GHC.Float (double2Float)
import System.Random (next)
import System.Random.Mersenne.Pure64 (PureMT, newPureMT, randomDouble)
import System.Random.Shuffle (shuffle')
import qualified Data.ByteString as B (hPut)
import qualified Pipes.Binary as P (encode)
import qualified Pipes.Prelude as P (zip, mapM, drain)
import Pipes (runEffect, (>->))
import System.IO (withFile, IOMode(AppendMode))

main :: IO ()
main = writeGrids 1000 64

{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
    rng <- newPureMT
    let (grids, shuffleds) = createGrids rng aa
        gridFile = "grids.bin"
        shuffledFile = "shuffleds.bin"
        encoder = P.encode . SerList . take num
    writeFile gridFile ""
    writeFile shuffledFile ""
    withFile gridFile AppendMode $ \hGr ->
        withFile shuffledFile AppendMode $ \hSh ->
            runEffect
                $ P.zip (encoder grids) (encoder shuffleds)
                >-> P.mapM (\(ch1, ch2) -> B.hPut hGr ch1 >> B.hPut hSh ch2)
                >-> P.drain -- discards the stream of () results.

{- a random number generator, dimension of grids to make
   returns a pair of lists, the first is a list of grids of dimensions
   aa x aa, the second is a list of the shuffled grids corresponding to the first list -}
createGrids :: PureMT -> Int -> ( [[(Float,Float)]], [[(Float,Float)]] )
createGrids rng aa = unzip gridsAndShuffleds where
       rs = randomFloats rng
       grids =  map (getGridR aa) (chunksOf (2 * aa * aa) rs)
       gridsAndShuffleds = shuffler (aa * aa) rng grids

{- length of each grid, a random number generator, a list of grids
   returns a the list with each grid shuffled -}
shuffler :: Int -> PureMT -> [[(Float,Float)]] -> [( [(Float,Float)], [(Float,Float)] )]
shuffler n rng xss = evalState (traverse oneShuffle xss) rng
    where
    oneShuffle xs = state $ \r -> ((xs, shuffle' xs n r), snd (next r))

newtype SerList a = SerList { runSerList :: [a] }
    deriving (Show)

instance Binary a => Binary (SerList a) where
    put (SerList (x:xs)) = put False >> put x >> put (SerList xs)
    put _                = put True
    get = do
        stop <- get :: Get Bool
        if stop
            then return (SerList [])
            else do
                x          <- get
                SerList xs <- get
                return (SerList (x : xs))

{- divides list into chunks of size n -}
chunksOf :: Int -> [a] -> [[a]]
chunksOf n = go 
     where go xs = case splitAt n xs of
              (ys,zs) | null ys -> []
                      | otherwise -> ys : go zs

{- dimension of grid, list of random floats [0,1]
   returns a list of (x,y) points of length n^2 such that all
   points are in the range [0,1] and the points are a randomly 
   perturbed regular grid -}
getGridR :: Int -> [Float] -> [(Float,Float)]
getGridR n rs = pts where
   nn = n * n
   (irs,jrs) = splitAt nn rs
   n' = fromIntegral n
   grid = [ (p,q) | p <- [0..n'-1], q <- [0..n'-1] ]
   pts = zipWith (\(p,q) (ir,jr) -> ((p+ir)/n',(q+jr)/n')) grid (zip irs jrs)

{- an infinite list of random floats in range [0,1] -}
randomFloats :: PureMT -> [Float]
randomFloats rng = let (d,rng') = first double2Float (randomDouble rng)
                   in d : randomFloats rng'

Comments on the changes:

  • shuffler is now a traversal with the State functor. It produces, in a single pass through the input list, a list of pairs, in which each grid is paired with its shuffled version. createGrids then (lazily) unzips this list.

  • The files are written to using pipes machinery, in a way loosely inspired by this answer (I originally wrote this using P.foldM). Note that the hPut I used is the strict bytestring one, for it acts on strict chunks supplied by the producer made with P.zip (which, in spirit, is a pair of lazy bytestrings that supplies chunks in pairs).

  • SerList is there to hold the custom Binary instance Thomas M. DuBuisson alludes to. Note that I haven't thought too much about laziness and strictness in the get method of the instance. If that causes you trouble, this question looks useful.

like image 29
duplode Avatar answered Sep 21 '22 05:09

duplode


Two reasons for the memory usage:

First, Data.Binary.encode doesn't seem to run in constant space. The following program uses 910 MB memory:

import Data.Binary
import qualified Data.ByteString.Lazy as B

len = 10000000 :: Int 

main = B.writeFile "grids.bin" $ encode [0..len]

If we leave a 0 out from len we get 97 MB memory usage.

In contrast, the following program uses 1 MB:

import qualified Data.ByteString.Lazy.Char8 as B

main = B.writeFile "grids.bin" $ B.pack $ show [0..(1000000::Int)]

Second, in your program shuffleds contains references to contents of grids, which prevents garbage collection of grids. So when we print grids, we also evaluate it and then it has to sit in memory until we finish printing shuffleds. The following version of your program still consumes lots of memory, but it uses constant space if we comment out one of the two lines with B.writeFile.

import qualified Data.ByteString.Lazy.Char8 as B

writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
    rng <- newPureMT
    let (grids,shuffleds) = createGrids rng aa
    B.writeFile "grids.bin" (B.pack $ show (take num grids))
    B.writeFile "shuffleds.bin" (B.pack $ show (take num shuffleds))
like image 54
András Kovács Avatar answered Sep 23 '22 05:09

András Kovács