Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ CSV line with commas and strings within double quotes

I'm reading a CSV file in C++ and the row format is as such:

"Primary, Secondary, Third", "Primary", , "Secondary", 18, 4, 0, 0, 0

(notice the empty value)

When I do:

while (std::getline(ss, csvElement, ',')) {
   csvColumn.push_back(csvElement);
}

This splits up the first string into pieces which isn't correct.

How do I preserve the string when iterating? I tried to do a combination of the above and while also grabbing the lines separated by double quote but I got wild results.

like image 558
dimxasnewfrozen Avatar asked Feb 25 '16 21:02

dimxasnewfrozen


Video Answer


2 Answers

Using std::quoted allows you to read quoted strings from input streams.

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

int main() {
    std::stringstream ss;
    ss << "\"Primary, Secondary, Third\", \"Primary\", , \"Secondary\", 18, 4, 0, 0, 0";

    while (ss >> std::ws) {
        std::string csvElement;

        if (ss.peek() == '"') {
            ss >> std::quoted(csvElement);
            std::string discard;
            std::getline(ss, discard, ',');
        }
        else {
            std::getline(ss, csvElement, ',');
        }

        std::cout << csvElement << "\n";
    }
}

Live Example

The caveat is that quoted strings are only extracted if the first non-whitespace character of a value is a double-quote. Additionally, any characters after the quoted strings will be discarded up until the next comma.

like image 169
user2093113 Avatar answered Oct 17 '22 03:10

user2093113


You need to interpret the comma depending on whether you're betwwen the quote or not. This is too complexfor getline().

The solution would be to read the full line with getline(), and parse the line by iterating through the string character by character, and maintaing an indicator whether you're between double quotes or not.

Here is a first "raw" example (double quotes are not removed in the fields and escape characters are not interpreted):

string line; 
while (std::getline(cin, line)) {        // read full line
    const char *mystart=line.c_str();    // prepare to parse the line - start is position of begin of field
    bool instring{false};                
    for (const char* p=mystart; *p; p++) {  // iterate through the string
        if (*p=='"')                        // toggle flag if we're btw double quote
            instring = !instring;     
        else if (*p==',' && !instring) {    // if comma OUTSIDE double quote
            csvColumn.push_back(string(mystart,p-mystart));  // keep the field
            mystart=p+1;                    // and start parsing next one
        }
    }
csvColumn.push_back(string(mystart));   // last field delimited by end of line instead of comma
}

Online demo

like image 27
Christophe Avatar answered Oct 17 '22 02:10

Christophe