Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing tab separated data

Tags:

c++

I have a text file (~10GB) with the following format:

data1<TAB>data2<TAB>data3<TAB>data4<NEWLINE>

I want to scan through it and do processing only on data2. What is the best (fastest) way to extract data2 in C++.

EDIT: Added NEWLINE

like image 999
Nemo Avatar asked Apr 24 '26 19:04

Nemo


2 Answers

Read the file line by line. For each line, split on the tab. That will leave you with an array containing the fields, allowing you to work with the second field (data2).

like image 187
Leons Avatar answered Apr 27 '26 07:04

Leons


This sounds like a job for a higher level tool like shell utilities:

cut -f2           # from stdin
cut -f2 <my_file  # from file

But nonetheless, you can do that with C++ as well:

void parse(std::istream& in)
{
    std::string word;
    while( in ) {
        std::cin >> word;  // throwaway 1
        std::cin >> word;  // data2
        process(word);
        std::cin >> word >> word;  // throwaway 3 and 4
    }
}

// ...
parse(std::cin);
std::ifstream file("my_file");
parse(file);
like image 38
wilhelmtell Avatar answered Apr 27 '26 07:04

wilhelmtell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!