I'm looking for an efficient way to read numbers from a text file without installing additional packages. Data.ByteString.Lazy.Char8.readInt
seems to do the trick for integers. I've read that ByteString
now has a readDouble
method, but when I write import Data.ByteString.Lex.Lazy.Double (readDouble)
the compiler complains:
Main.hs:4:7: Could not find module `Data.ByteString.Lex.Lazy.Double': locations searched: Data/ByteString/Lex/Lazy/Double.hs Data/ByteString/Lex/Lazy/Double.lhs
My bytestring package version is 0.9.1.5.
So, am I doing something wrong? Or maybe there is a better solution for the problem? Thanks.
Update: OK, seems that readDouble
is in package bytestring-lexer which is not installed by default. Any other idea?
Another solution: install the bytestring-lexing package, and use readDouble
, which I optimized for you.
cabal install bytestring-lexing
The package provides optimized parsing functions for floating point literals:
readDouble :: ByteString -> Maybe (Double, ByteString)
The only time I encountered parsing doubles on the critical path, I used this:
{-# LANGUAGE ForeignFunctionInterface #-}
import qualified Data.ByteString.Char8 as B
import Foreign.C.Types
import Foreign.C.String
import System.IO.Unsafe
foreign import ccall unsafe "stdlib.h atof" c_atof :: CString -> IO CDouble
unsafeReadDouble = unsafePerformIO . flip B.useAsCString c_atof
There wasn't anything that looked like a readDouble
in bytestring at that time, though. That would probably be a better solution if it's now standard.
Here's what I came up with.
I used the function offered by JB and added two tricks which I learned from the source code of bytestring-lexing (thanks, sclv!). The first one is this function:
strict = SB.concat . LB.toChunks
It transforms a lazy bytestring into non-lazy one efficiently.
The second trick is function Data.ByteString.Internal.inlinePerformIO
which is a more efficient variant of unsafePerformIO.
Here's complete code that allows a pretty fast number reading:
{-# LANGUAGE ForeignFunctionInterface #-}
import qualified Data.ByteString.Lazy.Char8 as LB
import qualified Data.ByteString as SB
import Data.ByteString.Internal (inlinePerformIO)
import Foreign.C.String (CString)
import Foreign.C (CDouble)
import Data.Maybe (fromJust)
foreign import ccall unsafe "stdlib.h atof" c_atof :: CString -> IO Double
unsafeReadDouble = inlinePerformIO . flip SB.useAsCString c_atof
{-# INLINE unsafeReadDouble #-}
readDouble = unsafeReadDouble . SB.concat . LB.toChunks
readInt = fst . fromJust . LB.readInt
And a sample program that calculates the sum of all numbers in the input:
It processes an 11Mb file (1M numbers) in about 0.5 seconds
main = LB.getContents >>= (print . sum . map readDouble . LB.lines)
I also found several links, where a much more efficient version of readInt
is discussed. Presumably one can build a readDouble
based on similar ideas. But I think I'll stick with my current version for now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With