Haskell or F# high throughput binary I/O

Tags:

How good is the performance of binary I/O libraries in these two languages> I am contemplating about re-writing an ugly (yet very fast) C++ code that processes binary files of around 5-10GB using standard fread and fwrite functions. What slow-down factor should I expect for an optimized implementation in F# and Haskell?

EDIT: here is the C implementation of counting zero-bytes (buffer allocated on heap).

#include <stdio.h>
#include <stdlib.h>

#define SIZE 32*1024
int main(int argc, char* argv[])
{
    FILE *fp;
    char *buf;
    long i = 0, s = 0, l = 0;
    fp = fopen(argv[1], "rb");
    if (!fp) {
        printf("Openning %s failed\n", argv[1]);
        return -1;
    }
    buf = (char *) malloc(SIZE);
    while (!feof(fp)) {
        l = fread(buf, 1, SIZE, fp);
        for (i = 0; i &lt l; ++i) {
            if (buf[i] == 0) {
                ++s;
            }
        }
    }
    printf("%d\n", s);
    fclose(fp);
    free(buf);
    return 0;
}

The results:


$ gcc -O3 -o ioc io.c
$ ghc --make -O3 -o iohs io.hs
Linking iohs ...
$ time ./ioc 2.bin
462741044

real    0m16.171s
user    0m11.755s
sys     0m4.413s
$ time ./iohs 2.bin
4757708340

real    0m16.879s
user    0m14.093s
sys     0m2.783s
$ ls -lh 2.bin
-rw-r--r-- 1  14G Jan  4 10:05 2.bin

978

asked Dec 31 '10 17:12

user394460

2 Answers

Haskell using lazy ByteString-based IO, with a "binary" parser should be around the same performance as C code doing the same job, on the same data types.

The key packages to be aware of:

bytestring
binary

196

answered Oct 08 '22 06:10

Don Stewart

Considering that this post entails:

Haskell
code optimizations
performance benchmarks

...it's safe to say that I'm in way over my head. Nevertheless, I always learn something when I get in over my head, so here goes.

I went spelunking around the Data.ByteString.Lazy.* Haskell modules via Hoogle and found the length function for measuring the length of a lazy ByteString. It is implemented thus:

length :: ByteString -> Int64
length cs = foldlChunks (\n c -> n + fromIntegral (S.length c)) 0 cs

Hmm. Jon did say that "...Folding over chunks of file in the F# is a major part of why it is fast..." (my emphasis). And this length function appears to be implemented using a chunky fold as well. So it appears that this function is much more of an 'apples to apples' comparison to Jon's F# code.

Does it make a difference in practice? I compared Jon's example to the following:

import System
import Data.List
import Data.ByteString.Lazy as B

main =
    getArgs
    >>= B.readFile . Data.List.head
    >>= print . B.length

Jon's Haskell example on my machine for a 1.2 GB file: 10.5s

The 'chunky' version: 1.1s

The 'chunky' version of the Haskell code is nearly ten times faster. Which suggests that it is probably multiple times faster than Jon's optimized F# code.

EDIT

While I don't necessarily completely agree with Jon's criticisms of my example, I would like to make it as impeachable as possible. As such, I have profiled the following code:

import System
import Data.List
import Data.ByteString.Lazy as B

main =
    getArgs
    >>= B.readFile . Data.List.head
    >>= print . B.count 0

This code loads the contents of the target file into a ByteString and then 'counts' each occurence of a 0-value byte. Unless I'm missing something, this program must load and evaluate each byte of the target file.

The above program runs consistently about 4x faster than the latest fastest Haskell program submitted by Jon, copied here for reference (in case it is updated):

import System
import Data.Int
import Data.List
import Data.ByteString.Lazy as B

main =
    getArgs
    >>= B.readFile . Data.List.head
    >>= print . B.foldl (\n c -> n + 1) (0 :: Data.Int.Int64)

answered Oct 08 '22 05:10

Daniel Pratt

Related questions
                            
                                Connection Pooling in .NET/SQL Server?
                            
                                Font-size independent UI: everything broke when I switched to 120 DPI?
                            
                                .NET Attributes List [closed]
                            
                                What are alternatives to generic collections for COM Interop?
                            
                                How can I force the PropertyGrid to show a custom dialog for a specific property?
                            
                                Easiest to learn and use .NET ORM framework? [closed]
                            
                                How is an assembly resolved in .NET?
                            
                                C# - StreamReader.ReadLine does not work properly!
                            
                                Get namespace in a static function
                            
                                Application.SetCompatibleTextRenderingDefault(false);
                            
                                How can I avoid flicker in a WPF fullscreen app?
                            
                                IDisposable: is it necessary to check for null on finally {}?
                            
                                creating enumeration using .NET's CodeDom
                            
                                How to break on unhandled exceptions in Silverlight
                            
                                wpf listview drag select multiple items
                            
                                Iterate through Object's own Strings & Trim each
                            
                                How to disable .NET Framework exception handling and use my own instead?
                            
                                Who should be responsible for closing a stream
                            
                                Fork Concept in C#
                            
                                Can I install .NET Framework 4 on Windows XP Embedded?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Haskell or F# high throughput binary I/O

Tags:

.net

io

haskell

f#

user394460

People also ask

2 Answers

Don Stewart

Daniel Pratt

Recent Activity

Donate For Us