Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse files the fast way?

I am writing on a graph library that should read the most common graph formats. One format contains information like this:

e 4 3
e 2 2
e 6 2
e 3 2
e 1 2
....

and I want to parse these lines. I looked around on stackoverflow and could find a neat solution to do this. I currently use an approach like this (file is an fstream):

string line;
while(getline(file, line)) {
    if(!line.length()) continue; //skip empty lines
    stringstream parseline = stringstream(line);
    char identifier;
    parseline >> identifier; //Lese das erste zeichen
    if(identifier == 'e')   {
        int n, m;
        parseline >> n;
        parseline >> m;
        foo(n,m) //Here i handle the input
    }
}

It works quite good and as intended, but today when I tested it with huge graph files (50 mb+) I was shocked that this function was by far the worst bottleneck in the whole program:

The stringstream I use to parse the line uses almost 70% of the total runtime and the getline command 25%. The rest of the program uses only 5%.

Is there a fast way to read those big files, possibly avoiding slow stringstreams and the getline function?

like image 705
Listing Avatar asked Dec 05 '25 11:12

Listing


1 Answers

You can skip double-buffering your string, skip parsing the single character, and use strtoll to parse integers, like this:

string line;
while(getline(file, line)) {
    if(!line.length()) continue; //skip empty lines
    if (line[0] == 'e') {
        char *ptr;
        int n = strtoll(line.c_str()+2, &ptr, 10);
        int m = strtoll(ptr+1, &ptr, 10);
        foo(n,m) //Here i handle the input
    }
}

In C++, strtoll should be in the <cstdlib> include file.

like image 181
Sergey Kalinichenko Avatar answered Dec 07 '25 01:12

Sergey Kalinichenko



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!