How good is the performance of binary I/O libraries in these two languages> I am contemplating about re-writing an ugly (yet very fast) C++ code that processes binary files of around 5-10GB using standard fread and fwrite functions. What slow-down factor should I expect for an optimized implementation in F# and Haskell?
EDIT: here is the C implementation of counting zero-bytes (buffer allocated on heap).
#include <stdio.h>
#include <stdlib.h>
#define SIZE 32*1024
int main(int argc, char* argv[])
{
FILE *fp;
char *buf;
long i = 0, s = 0, l = 0;
fp = fopen(argv[1], "rb");
if (!fp) {
printf("Openning %s failed\n", argv[1]);
return -1;
}
buf = (char *) malloc(SIZE);
while (!feof(fp)) {
l = fread(buf, 1, SIZE, fp);
for (i = 0; i < l; ++i) {
if (buf[i] == 0) {
++s;
}
}
}
printf("%d\n", s);
fclose(fp);
free(buf);
return 0;
}
The results:
$ gcc -O3 -o ioc io.c
$ ghc --make -O3 -o iohs io.hs
Linking iohs ...
$ time ./ioc 2.bin
462741044
real 0m16.171s
user 0m11.755s
sys 0m4.413s
$ time ./iohs 2.bin
4757708340
real 0m16.879s
user 0m14.093s
sys 0m2.783s
$ ls -lh 2.bin
-rw-r--r-- 1 14G Jan 4 10:05 2.bin
But if you really want to learn functional programming, Haskell is a much better choice. If you just want to get something done and don't have much time to really learn Haskell, then F# is probably a better choice for you. It depends on how much time are you willing to invest.
Yes, Haskell is worth learning in 2022 because functional languages like it are getting more popular among big companies like Facebook. Functional languages are typically ideal for big data and machine learning.
Haskell is not considered as a go-to programming language for obvious reasons: its powers and elegances differ from the requirements of most popular programming. Although it's a long way off in Haskell, runtime polymorphism is one of the most used patterns in modern programming.
F# has been lauded as both a great language for domain-driven development and data-driven development, and thanks to Fable, built by Alfonso Garcia-Caro, it can now be compiled into JavaScript, linking it with one of the world's most popular programming languages as well as JavaScript's widely installed base of devices ...
Haskell using lazy ByteString-based IO, with a "binary" parser should be around the same performance as C code doing the same job, on the same data types.
The key packages to be aware of:
Considering that this post entails:
...it's safe to say that I'm in way over my head. Nevertheless, I always learn something when I get in over my head, so here goes.
I went spelunking around the Data.ByteString.Lazy.*
Haskell modules via Hoogle and found the length function for measuring the length of a lazy ByteString. It is implemented thus:
length :: ByteString -> Int64
length cs = foldlChunks (\n c -> n + fromIntegral (S.length c)) 0 cs
Hmm. Jon did say that "...Folding over chunks of file in the F# is a major part of why it is fast..." (my emphasis). And this length
function appears to be implemented using a chunky fold as well. So it appears that this function is much more of an 'apples to apples' comparison to Jon's F# code.
Does it make a difference in practice? I compared Jon's example to the following:
import System
import Data.List
import Data.ByteString.Lazy as B
main =
getArgs
>>= B.readFile . Data.List.head
>>= print . B.length
Jon's Haskell example on my machine for a 1.2 GB file: 10.5s
The 'chunky' version: 1.1s
The 'chunky' version of the Haskell code is nearly ten times faster. Which suggests that it is probably multiple times faster than Jon's optimized F# code.
EDIT
While I don't necessarily completely agree with Jon's criticisms of my example, I would like to make it as impeachable as possible. As such, I have profiled the following code:
import System
import Data.List
import Data.ByteString.Lazy as B
main =
getArgs
>>= B.readFile . Data.List.head
>>= print . B.count 0
This code loads the contents of the target file into a ByteString and then 'counts' each occurence of a 0-value byte. Unless I'm missing something, this program must load and evaluate each byte of the target file.
The above program runs consistently about 4x faster than the latest fastest Haskell program submitted by Jon, copied here for reference (in case it is updated):
import System
import Data.Int
import Data.List
import Data.ByteString.Lazy as B
main =
getArgs
>>= B.readFile . Data.List.head
>>= print . B.foldl (\n c -> n + 1) (0 :: Data.Int.Int64)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With