Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell or F# high throughput binary I/O

Tags:

.net

io

haskell

f#

How good is the performance of binary I/O libraries in these two languages> I am contemplating about re-writing an ugly (yet very fast) C++ code that processes binary files of around 5-10GB using standard fread and fwrite functions. What slow-down factor should I expect for an optimized implementation in F# and Haskell?

EDIT: here is the C implementation of counting zero-bytes (buffer allocated on heap).

#include <stdio.h>
#include <stdlib.h>

#define SIZE 32*1024
int main(int argc, char* argv[])
{
    FILE *fp;
    char *buf;
    long i = 0, s = 0, l = 0;
    fp = fopen(argv[1], "rb");
    if (!fp) {
        printf("Openning %s failed\n", argv[1]);
        return -1;
    }
    buf = (char *) malloc(SIZE);
    while (!feof(fp)) {
        l = fread(buf, 1, SIZE, fp);
        for (i = 0; i &lt l; ++i) {
            if (buf[i] == 0) {
                ++s;
            }
        }
    }
    printf("%d\n", s);
    fclose(fp);
    free(buf);
    return 0;
}

The results:


$ gcc -O3 -o ioc io.c
$ ghc --make -O3 -o iohs io.hs
Linking iohs ...
$ time ./ioc 2.bin
462741044

real    0m16.171s
user    0m11.755s
sys     0m4.413s
$ time ./iohs 2.bin
4757708340

real    0m16.879s
user    0m14.093s
sys     0m2.783s
$ ls -lh 2.bin
-rw-r--r-- 1  14G Jan  4 10:05 2.bin
like image 978
user394460 Avatar asked Dec 31 '10 17:12

user394460


People also ask

Is F# better than Haskell?

But if you really want to learn functional programming, Haskell is a much better choice. If you just want to get something done and don't have much time to really learn Haskell, then F# is probably a better choice for you. It depends on how much time are you willing to invest.

Is Haskell still worth learning?

Yes, Haskell is worth learning in 2022 because functional languages like it are getting more popular among big companies like Facebook. Functional languages are typically ideal for big data and machine learning.

Why is Haskell not popular?

Haskell is not considered as a go-to programming language for obvious reasons: its powers and elegances differ from the requirements of most popular programming. Although it's a long way off in Haskell, runtime polymorphism is one of the most used patterns in modern programming.

Why is F# so popular?

F# has been lauded as both a great language for domain-driven development and data-driven development, and thanks to Fable, built by Alfonso Garcia-Caro, it can now be compiled into JavaScript, linking it with one of the world's most popular programming languages as well as JavaScript's widely installed base of devices ...


2 Answers

Haskell using lazy ByteString-based IO, with a "binary" parser should be around the same performance as C code doing the same job, on the same data types.

The key packages to be aware of:

  • bytestring
  • binary
like image 196
Don Stewart Avatar answered Oct 08 '22 06:10

Don Stewart


Considering that this post entails:

  • Haskell
  • code optimizations
  • performance benchmarks

...it's safe to say that I'm in way over my head. Nevertheless, I always learn something when I get in over my head, so here goes.

I went spelunking around the Data.ByteString.Lazy.* Haskell modules via Hoogle and found the length function for measuring the length of a lazy ByteString. It is implemented thus:

length :: ByteString -> Int64
length cs = foldlChunks (\n c -> n + fromIntegral (S.length c)) 0 cs

Hmm. Jon did say that "...Folding over chunks of file in the F# is a major part of why it is fast..." (my emphasis). And this length function appears to be implemented using a chunky fold as well. So it appears that this function is much more of an 'apples to apples' comparison to Jon's F# code.

Does it make a difference in practice? I compared Jon's example to the following:

import System
import Data.List
import Data.ByteString.Lazy as B

main =
    getArgs
    >>= B.readFile . Data.List.head
    >>= print . B.length

Jon's Haskell example on my machine for a 1.2 GB file: 10.5s

The 'chunky' version: 1.1s

The 'chunky' version of the Haskell code is nearly ten times faster. Which suggests that it is probably multiple times faster than Jon's optimized F# code.

EDIT

While I don't necessarily completely agree with Jon's criticisms of my example, I would like to make it as impeachable as possible. As such, I have profiled the following code:

import System
import Data.List
import Data.ByteString.Lazy as B

main =
    getArgs
    >>= B.readFile . Data.List.head
    >>= print . B.count 0

This code loads the contents of the target file into a ByteString and then 'counts' each occurence of a 0-value byte. Unless I'm missing something, this program must load and evaluate each byte of the target file.

The above program runs consistently about 4x faster than the latest fastest Haskell program submitted by Jon, copied here for reference (in case it is updated):

import System
import Data.Int
import Data.List
import Data.ByteString.Lazy as B

main =
    getArgs
    >>= B.readFile . Data.List.head
    >>= print . B.foldl (\n c -> n + 1) (0 :: Data.Int.Int64)
like image 28
Daniel Pratt Avatar answered Oct 08 '22 05:10

Daniel Pratt