Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How make chunking work with amazonka, conduit and lazy bytestring

I wrote the code below to simulate upload to S3 from Lazy ByteString (which will be received over the network socket. Here, we simulate by reading from a file of size ~100MB). The problem with the code below is that it seems to be forcing the read of entire file into memory instead of chunking it (cbytes) - will appreciate pointers on why chunking is not working:

import Control.Lens
import Network.AWS
import Network.AWS.S3
import Network.AWS.Data.Body
import System.IO
import           Data.Conduit (($$+-))
import           Data.Conduit.Binary (sinkLbs,sourceLbs)
import qualified Data.Conduit.List as CL (mapM_)
import           Network.HTTP.Conduit (responseBody,RequestBody(..),newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS

example :: IO PutObjectResponse
example = do
    -- To specify configuration preferences, newEnv is used to create a new Env. The Region denotes the AWS region requests will be performed against,
    -- and Credentials is used to specify the desired mechanism for supplying or retrieving AuthN/AuthZ information.
    -- In this case, Discover will cause the library to try a number of options such as default environment variables, or an instance's IAM Profile:
    e <- newEnv NorthVirginia Discover

    -- A new Logger to replace the default noop logger is created, with the logger set to print debug information and errors to stdout:
    l <- newLogger Debug stdout

    -- The payload for the S3 object is retrieved from a file that simulates lazy bytestring received over network
    inb <- LBS.readFile "out"
    lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
    let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)

    -- We now run the AWS computation with the overriden logger, performing the PutObject request:
    runResourceT . runAWS (e & envLogger .~ l) $
        send ((putObject "yourtestenv-change-it-please" "testbucket/test" cbytes) & poContentType .~ Just "text; charset=UTF-8")

main = example >> return ()

Running the executable with RTS -s option shows that entire thing is read into memory (~113MB maximum residency - I did see ~87MB once). On the other hand, if I use chunkedFile, it is chunked correctly (~10MB maximum residency).

like image 524
Sal Avatar asked Oct 23 '25 18:10

Sal


1 Answers

It's clear this bit

  inb <- LBS.readFile "out"
  lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
  let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)

should be rewritten as

  lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
  let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (C.sourceFile "out")

As you wrote it, the purpose of conduits is defeated. The entire file would need to be accumulated by LBS.readFile, but then broken apart chunk by chunk when fed to sourceLBS. (If lazy IO is working right, this might not happen.) sourceFile reads the file incrementally, chunk by chunk. It may be that, e.g. toBody accumulates the whole file, in which case the point of conduits is defeated at a different point. Glancing at the source for send and so on I can't see anything that would do this, though.

like image 181
Michael Avatar answered Oct 26 '25 09:10

Michael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!