I've been making rather poor attempts at the PRIME1 problem on SPOJ. I discovered using that using ByteString really helped performance for reading in the problem text. However, using ByteString to write out the results is actually slightly slower than using Prelude functions. I'm trying to figure out if I'm doing it wrong, or if this is expected.
I've conducted profiling and timing using (putStrLn.show) and the ByteString equivalents three different ways:
I expected numbers 2 and 3 to perform slower as you are building a list in one function and consuming it in another. By printing the numbers as I generate them, I avoid allocating any memory for the list. On the other hand, you are making a call system call with each call to putStrLn. Right? So I tested and #1 was in fact the fastest.
The best performance was achieved with option #1 and the Prelude ([Char]) functions. I expected that my best performance would be option #1 with ByteString, but this was not the case. I only used lazy ByteStrings, but I didn't think this would matter. Would it?
Some questions:
My working hypothesis is that writing out Integer's with ByteString is slower iff you aren't combining them with other text. If you are combining Integers with [Char], then you'd get better performance working with ByteStrings. I.e., the ByteString rewrite of:
putStrLn $ "the answer is: " ++ (show value)
will be much faster than the version written above. Is this true?
Thanks for reading!
Doing bulk input is usually faster with bytestrings, since the data is dense, there's simply less data to shuffle from the disk into memory.
Writing data as output however, is a little different. Typically, you're serializing a structure, generating many small writes. So the dense, bulk writes of bytestrings don't help you much in that case. Even regular Strings
will do reasonably at incremental output.
However, all is not lost. We can recover fast bulk writes by efficiently building up bytestrings in memory. This approach is taken by the various *-builder
packages:
Instead of converting values to lots of tiny bytestrings, and writing them out one at a time, we stream the conversion into an ever-growing buffer, and in turn, write that buffer in one big piece. This results in a lot less IO overhead, and performance improvements (often signficant) over string IO.
This kind of approach is taken by e.g. webservers in Haskell, or the efficient HTML system, blaze.
Also, the performance, even with bulk writes, will depend on the efficiency of whatever conversion function you have between your types and bytestrings. For Integer
, you could be simply copying the bit pattern in memory to output, or instead going through some inefficient decoder. As a result, you sometimes have to think a bit about the quality of the encoding function you're using, and not just whether to use Char/String or bytestring IO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With