I am trying to write a very large amount of data to a file in constant memory.
import qualified Data.ByteString.Lazy as B
{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
createDirectoryIfMissing True "data/grids/"
B.writeFile (gridFileName num aa)
(encode (take num grids))
B.writeFile (shuffledFileName num aa)
(encode (take num shuffleds))
However this consumes memory proportional to the size of num
. I know createGrids
is a sufficiently lazy function because I have tested it by appending error "not lazy enough"
(as suggested by the Haskell wiki here) to the end of the lists it returns and no errors are raised. take
is a lazy function that is defined in Data.List
. encode
is also a lazy function defined in Data.Binary
. B.writeFile
is defined in Data.ByteString.Lazy
.
Here is the complete code so you can execute it:
import Control.Arrow (first)
import Data.Binary
import GHC.Float (double2Float)
import System.Random (next)
import System.Random.Mersenne.Pure64 (PureMT, newPureMT, randomDouble)
import System.Random.Shuffle (shuffle')
import qualified Data.ByteString.Lazy as B
main :: IO ()
main = writeGrids 1000 64
{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
B.writeFile "grids.bin" (encode (take num grids))
B.writeFile "shuffleds.bin" (encode (take num shuffleds))
{- a random number generator, dimension of grids to make
returns a pair of lists, the first is a list of grids of dimensions
aa x aa, the second is a list of the shuffled grids corresponding to the first list -}
createGrids :: PureMT -> Int -> ([[(Float,Float)]],[[(Float,Float)]])
createGrids rng aa = (grids,shuffleds) where
rs = randomFloats rng
grids = map (getGridR aa) (chunksOf (2 * aa * aa) rs)
shuffleds = shuffler (aa * aa) rng grids
{- length of each grid, a random number generator, a list of grids
returns a the list with each grid shuffled -}
shuffler :: Int -> PureMT -> [[(Float,Float)]] -> [[(Float,Float)]]
shuffler n rng (xs:xss) = shuffle' xs n rng : shuffler n (snd (next rng)) xss
shuffler _ _ [] = []
{- divides list into chunks of size n -}
chunksOf :: Int -> [a] -> [[a]]
chunksOf n = go
where go xs = case splitAt n xs of
(ys,zs) | null ys -> []
| otherwise -> ys : go zs
{- dimension of grid, list of random floats [0,1]
returns a list of (x,y) points of length n^2 such that all
points are in the range [0,1] and the points are a randomly
perturbed regular grid -}
getGridR :: Int -> [Float] -> [(Float,Float)]
getGridR n rs = pts where
nn = n * n
(irs,jrs) = splitAt nn rs
n' = fromIntegral n
grid = [ (p,q) | p <- [0..n'-1], q <- [0..n'-1] ]
pts = zipWith (\(p,q) (ir,jr) -> ((p+ir)/n',(q+jr)/n')) grid (zip irs jrs)
{- an infinite list of random floats in range [0,1] -}
randomFloats :: PureMT -> [Float]
randomFloats rng = let (d,rng') = first double2Float (randomDouble rng)
in d : randomFloats rng'
The required packages are: , bytestring , binary , random , mersenne-random-pure64 , random-shuffle
As a result, a read from constant memory costs one memory read from device memory only on a cache miss; otherwise, it just costs one read from the constant cache. For all threads of a half warp, reading from the constant cache is as fast as reading from a register as long as all threads read the same address.
In essence, you should not use constant memory when every thread in a block don’t access same address space. For example; you have an array of 3,000 elements and you breaks this element to lunch sufficient number of threads in a block.
To fix “Your device does not have enough memory to run this experience” on Roblox, you need to force close all apps and delete some apps or photos. Alternatively, you can try logging out and logging in or uninstalling and reinstalling Roblox. The error message indicates that your device does not have enough memory.
There are two reasons why reading from the 64KB of constant memory can save bandwidth over standard reads of global memory: A single read from constant memory can be broadcast to other “nearby” threads, effectively saving up to 15 reads.
For what it's worth, here is a full solution combining the ideas of everyone here. Memory consumption is constant at ~6MB (compiled with -O2
).
import Control.Arrow (first)
import Control.Monad.State (state, evalState)
import Data.Binary
import GHC.Float (double2Float)
import System.Random (next)
import System.Random.Mersenne.Pure64 (PureMT, newPureMT, randomDouble)
import System.Random.Shuffle (shuffle')
import qualified Data.ByteString as B (hPut)
import qualified Pipes.Binary as P (encode)
import qualified Pipes.Prelude as P (zip, mapM, drain)
import Pipes (runEffect, (>->))
import System.IO (withFile, IOMode(AppendMode))
main :: IO ()
main = writeGrids 1000 64
{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids, shuffleds) = createGrids rng aa
gridFile = "grids.bin"
shuffledFile = "shuffleds.bin"
encoder = P.encode . SerList . take num
writeFile gridFile ""
writeFile shuffledFile ""
withFile gridFile AppendMode $ \hGr ->
withFile shuffledFile AppendMode $ \hSh ->
runEffect
$ P.zip (encoder grids) (encoder shuffleds)
>-> P.mapM (\(ch1, ch2) -> B.hPut hGr ch1 >> B.hPut hSh ch2)
>-> P.drain -- discards the stream of () results.
{- a random number generator, dimension of grids to make
returns a pair of lists, the first is a list of grids of dimensions
aa x aa, the second is a list of the shuffled grids corresponding to the first list -}
createGrids :: PureMT -> Int -> ( [[(Float,Float)]], [[(Float,Float)]] )
createGrids rng aa = unzip gridsAndShuffleds where
rs = randomFloats rng
grids = map (getGridR aa) (chunksOf (2 * aa * aa) rs)
gridsAndShuffleds = shuffler (aa * aa) rng grids
{- length of each grid, a random number generator, a list of grids
returns a the list with each grid shuffled -}
shuffler :: Int -> PureMT -> [[(Float,Float)]] -> [( [(Float,Float)], [(Float,Float)] )]
shuffler n rng xss = evalState (traverse oneShuffle xss) rng
where
oneShuffle xs = state $ \r -> ((xs, shuffle' xs n r), snd (next r))
newtype SerList a = SerList { runSerList :: [a] }
deriving (Show)
instance Binary a => Binary (SerList a) where
put (SerList (x:xs)) = put False >> put x >> put (SerList xs)
put _ = put True
get = do
stop <- get :: Get Bool
if stop
then return (SerList [])
else do
x <- get
SerList xs <- get
return (SerList (x : xs))
{- divides list into chunks of size n -}
chunksOf :: Int -> [a] -> [[a]]
chunksOf n = go
where go xs = case splitAt n xs of
(ys,zs) | null ys -> []
| otherwise -> ys : go zs
{- dimension of grid, list of random floats [0,1]
returns a list of (x,y) points of length n^2 such that all
points are in the range [0,1] and the points are a randomly
perturbed regular grid -}
getGridR :: Int -> [Float] -> [(Float,Float)]
getGridR n rs = pts where
nn = n * n
(irs,jrs) = splitAt nn rs
n' = fromIntegral n
grid = [ (p,q) | p <- [0..n'-1], q <- [0..n'-1] ]
pts = zipWith (\(p,q) (ir,jr) -> ((p+ir)/n',(q+jr)/n')) grid (zip irs jrs)
{- an infinite list of random floats in range [0,1] -}
randomFloats :: PureMT -> [Float]
randomFloats rng = let (d,rng') = first double2Float (randomDouble rng)
in d : randomFloats rng'
Comments on the changes:
shuffler
is now a traversal with the State
functor. It produces, in a single pass through the input list, a list of pairs, in which each grid is paired with its shuffled version. createGrids
then (lazily) unzips this list.
The files are written to using pipes
machinery, in a way loosely inspired by this answer (I originally wrote this using P.foldM
). Note that the hPut
I used is the strict bytestring one, for it acts on strict chunks supplied by the producer made with P.zip
(which, in spirit, is a pair of lazy bytestrings that supplies chunks in pairs).
SerList
is there to hold the custom Binary
instance Thomas M. DuBuisson alludes to. Note that I haven't thought too much about laziness and strictness in the get
method of the instance. If that causes you trouble, this question looks useful.
Two reasons for the memory usage:
First, Data.Binary.encode
doesn't seem to run in constant space. The following program uses 910 MB memory:
import Data.Binary
import qualified Data.ByteString.Lazy as B
len = 10000000 :: Int
main = B.writeFile "grids.bin" $ encode [0..len]
If we leave a 0
out from len
we get 97 MB memory usage.
In contrast, the following program uses 1 MB:
import qualified Data.ByteString.Lazy.Char8 as B
main = B.writeFile "grids.bin" $ B.pack $ show [0..(1000000::Int)]
Second, in your program shuffleds
contains references to contents of grids
, which prevents garbage collection of grids
. So when we print grids
, we also evaluate it and then it has to sit in memory until we finish printing shuffleds
. The following version of your program still consumes lots of memory, but it uses constant space if we comment out one of the two lines with B.writeFile
.
import qualified Data.ByteString.Lazy.Char8 as B
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
B.writeFile "grids.bin" (B.pack $ show (take num grids))
B.writeFile "shuffleds.bin" (B.pack $ show (take num shuffleds))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With