Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Codec.Binary.Base64.encode is slow

Tags:

haskell

I am using http://hackage.haskell.org/package/dataenc-0.14.0.5/docs/Codec-Binary-Base64.html#v:encode and I found that this is terribly slow:

import qualified Codec.Binary.Base64 as C
import System.Environment

main = do
    [arg] <- getArgs
    print $ length $ C.encode $ replicate (read arg) 80

runtime for argument 10^5 : 0.5 s, 2*10^5 : 3.4 s, 3*10^5 : 9.4 s. But such things should run in linear time?

While I'm looking for the cause of the problem - is there a workaround (a different library)?

PS: this line of code https://github.com/magthe/dataenc/blob/master/src/Codec/Binary/Base64.hs#L77 looks highly questionable (that is, quadratic):

doEnc acc (o1:o2:o3:os) = doEnc (acc ++ enc3 [o1, o2, o3]) os
doEnc acc os = EPart acc (eI os)

Sure enough, using the straightforward code

encode ws = case ws of
    [] -> [] 
    o1:ws2 -> case ws2 of
        [] -> take 2 (enc3 [o1,0,0]) ++ "=="
        o2:ws3 -> case ws3 of
            [] -> take 3 (enc3 [o1,o2,0]) ++ "="
            o3:ws4 -> enc3 [o1,o2,o3] ++ encode ws4

already gives a drastic improvement (e.g., 10^8 bytes encoded in 4 s)

PPS: as per the suggestion below, this program

import qualified Data.ByteString as B
import qualified Data.ByteString.Base64 as B64
import System.Environment

main = do
    [arg] <- getArgs
    print $ B.length $ B64.encode $ B.replicate (read arg) 80

encodes 10^8 bytes in 0.4 s.

like image 405
d8d0d65b3f7cf42 Avatar asked Oct 20 '22 17:10

d8d0d65b3f7cf42


1 Answers

Try the base64-bytestring library. Typically using ByteStrings when dealing with binary data will vastly improve performance. The list type is useful as a control structure for its simplicity but it does not perform well.

like image 69
J. Abrahamson Avatar answered Nov 15 '22 09:11

J. Abrahamson