Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient number reading in Haskell

I'm looking for an efficient way to read numbers from a text file without installing additional packages. Data.ByteString.Lazy.Char8.readInt seems to do the trick for integers. I've read that ByteString now has a readDouble method, but when I write import Data.ByteString.Lex.Lazy.Double (readDouble) the compiler complains:

    Main.hs:4:7:
        Could not find module `Data.ByteString.Lex.Lazy.Double':
          locations searched:
            Data/ByteString/Lex/Lazy/Double.hs
            Data/ByteString/Lex/Lazy/Double.lhs

My bytestring package version is 0.9.1.5.

So, am I doing something wrong? Or maybe there is a better solution for the problem? Thanks.

Update: OK, seems that readDouble is in package bytestring-lexer which is not installed by default. Any other idea?

like image 563
adamax Avatar asked Dec 20 '10 12:12

adamax


3 Answers

Another solution: install the bytestring-lexing package, and use readDouble, which I optimized for you.

 cabal install bytestring-lexing

The package provides optimized parsing functions for floating point literals:

 readDouble :: ByteString -> Maybe (Double, ByteString)         
like image 175
Don Stewart Avatar answered Oct 04 '22 03:10

Don Stewart


The only time I encountered parsing doubles on the critical path, I used this:

{-# LANGUAGE ForeignFunctionInterface #-}
import qualified Data.ByteString.Char8 as B
import Foreign.C.Types
import Foreign.C.String
import System.IO.Unsafe

foreign import ccall unsafe "stdlib.h atof" c_atof :: CString -> IO CDouble
unsafeReadDouble = unsafePerformIO . flip B.useAsCString c_atof

There wasn't anything that looked like a readDouble in bytestring at that time, though. That would probably be a better solution if it's now standard.

like image 27
JB. Avatar answered Oct 04 '22 04:10

JB.


Here's what I came up with.

I used the function offered by JB and added two tricks which I learned from the source code of bytestring-lexing (thanks, sclv!). The first one is this function:

strict = SB.concat . LB.toChunks

It transforms a lazy bytestring into non-lazy one efficiently.

The second trick is function Data.ByteString.Internal.inlinePerformIO which is a more efficient variant of unsafePerformIO.

Here's complete code that allows a pretty fast number reading:


{-# LANGUAGE ForeignFunctionInterface #-}

import qualified Data.ByteString.Lazy.Char8 as LB
import qualified Data.ByteString as SB
import Data.ByteString.Internal (inlinePerformIO)
import Foreign.C.String (CString)
import Foreign.C (CDouble)
import Data.Maybe (fromJust)

foreign import ccall unsafe "stdlib.h atof" c_atof :: CString -> IO Double
unsafeReadDouble = inlinePerformIO . flip SB.useAsCString c_atof
{-# INLINE unsafeReadDouble #-}
readDouble = unsafeReadDouble . SB.concat . LB.toChunks
readInt = fst . fromJust . LB.readInt

And a sample program that calculates the sum of all numbers in the input:

main = LB.getContents >>= (print . sum . map readDouble . LB.lines)
It processes an 11Mb file (1M numbers) in about 0.5 seconds

I also found several links, where a much more efficient version of readInt is discussed. Presumably one can build a readDouble based on similar ideas. But I think I'll stick with my current version for now.

like image 30
adamax Avatar answered Oct 04 '22 04:10

adamax