Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this "line count" program slow in Java? Using MappedByteBuffer

To try MappedByteBuffer (memory mapped file in Java), I wrote a simple wc -l (text file line count) demo:

int wordCount(String fileName) throws IOException {
    FileChannel fc = new RandomAccessFile(new File(fileName), "r").getChannel();
    MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());

    int nlines = 0;
    byte newline = '\n';

    for(long i = 0; i < fc.size(); i++) {
        if(mem.get() == newline)
            nlines += 1;
    }

    return nlines;
}

I tried this on a file of about 15 MB (15008641 bytes), and 100k lines. On my laptop, it takes about 13.8 sec. Why is it so slow?

Complete class code is here: http://pastebin.com/t8PLRGMa

For the reference, I wrote the same idea in C: http://pastebin.com/hXnDvZm6

It runs in about 28 ms, or 490 times faster.

Out of curiosity, I also wrote a Scala version using essentially the same algorithm and APIs as in Java. It runs 10 times faster, which suggests there is definitely something odd going on.

Update: The file is cached by the OS, so there is no disk loading time involved.

I wanted to use memory mapping for random access to bigger files which may not fit into RAM. That is why I am not just using a BufferedReader.

like image 882
cidermole Avatar asked Apr 02 '16 12:04

cidermole


1 Answers

The code is very slow, because fc.size() is called in the loop.

JVM obviously cannot eliminate fc.size(), since file size can be changed in run-time. Querying file size is relatively slow, because it requires a system call to the underlying file system.

Change this to

    long size = fc.size();
    for (long i = 0; i < size; i++) {
        ...
    }
like image 72
apangin Avatar answered Nov 12 '22 09:11

apangin