Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to get data from a CSV in C++

I have a large CSV (75 MB approximately) of this kind:

1,2,4
5,2,0
1,6,3
8,3,1
...

And I store my data with this code:

#include <sstream>
#include <fstream>
#include <vector>

int main()
{
    char c; // to eat the commas

    int x, y, z;
    std::vector<int> xv, yv, zv;

    std::ifstream file("data.csv");
    std::string line;

    while (std::getline(file, line)) {
        std::istringstream ss(line);
        ss >> x >> c >> y >> c >> z;
        xv.push_back(x);
        yv.push_back(y);
        zv.push_back(z);
    }

    return 0;
}

And it tooks me in this large CSV (~75MB):

real        0m7.389s
user        0m7.232s
sys         0m0.132s

That's so much!

Recently, using a Snippet of Sublime Text, I found another way to read a file:

#include <iostream>
#include <vector>
#include <cstdio>

int main()
{
    std::vector<char> v;

    if (FILE *fp = fopen("data.csv", "r")) {
        char buf[1024];
        while (size_t len = fread(buf, 1, sizeof(buf), fp))
            v.insert(v.end(), buf, buf + len);
        fclose(fp);
    }
}

And it tooks me (without getting data) in this large CSV (~75MB):

real        0m0.118s
user        0m0.036s
sys         0m0.080s

That's a huge difference on time!

The question is how I can get the data in 3 vectors in a faster way in a vector of chars! I don't know how can I do in a faster way than the first proposed.

Thank you very much! ^^

like image 574
Manuel Ignacio López Quintero Avatar asked Feb 14 '14 11:02

Manuel Ignacio López Quintero


2 Answers

Save in the file, how many numbers are written inside. Then, on loading resize the vectors. It could reduce the time a bit.

like image 190
Dudev851 Avatar answered Sep 17 '22 12:09

Dudev851


Of course your second version will be much faster - it merely reads the file into memory, without parsing the values in it. The equivalent of the first version using C-style I/O would be along the lines of

if (FILE *fp = fopen("data.csv", "r")) {
    while (fscanf(fp, "%d,%d,%d", &x, &y, &z) == 3) {
        xv.push_back(x);
        yv.push_back(y);
        zv.push_back(z);
    }
    fclose(fp);
}

which, for me, is about three times faster than the C++-style version. But a C++ version without the intermediate stringstream

while (file >> x >> c >> y >> c >> z) {
    xv.push_back(x);
    yv.push_back(y);
    zv.push_back(z);
}

is almost as fast.

like image 33
Mike Seymour Avatar answered Sep 18 '22 12:09

Mike Seymour