I have tried some fixes mentioned in other answers but they had no effect on my output. I was not planning on using boost spirit as I am not sure it is the best option for my needs. Also the similar post does not deal with quoted material which contains commas, which is my last issue to resolve at this point.
This is a C++ program. I am using a CSV file as input. This file gives features of seals, there are 23 values(columns) per entry. When I output rawdata[22] I expect to see the last entry of the first set of data. Instead, I see the last entry (Petitioned) followed by the first entry (2055) of the next seal. When I open this in a hex editor I see the two words are separated by a "." and the hex character is 0a. I have tried setting \r, \n, \r\n, as delimiters but they do not work. I cannot use "," as a delimiter because it is used within strings, I tested it to see if it would work for my issue anyway and it didn't. How to separate these values?
OUTPUT:
Petitioned
2055
SAMPLE INPUT:
SpeciesID,Kingdom,Phylum,Class,Order,Family,Genus,Species,Authority,Infraspecific rank,Infraspecific name,Infraspecific authority,Stock/subpopulation,Synonyms,Common names (Eng),Common names (Fre),Common names (Spa),Red List status,Red List criteria,Red List criteria version,Year assessed,Population trend,Petitioned 2055,ANIMALIA,CHORDATA,MAMMALIA,CARNIVORA,OTARIIDAE,Arctocephalus,australis,"(Zimmermann, 1783)",,,,,Arctophoca australis,South American Fur Seal,Otarie fourrure Australe,Oso Marino Austral,LC,,3.1,2016,increasing,N 41664,ANIMALIA,CHORDATA,MAMMALIA,CARNIVORA,OTARIIDAE,Arctocephalus,forsteri,"(Lesson, 1828)",,,,,Arctocephalus australis subspecies forsteri|Arctophoca australis subspecies forsteri,"New Zealand Fur Seal, Antipodean Fur Seal, Australasian Fur Seal, Black Fur Seal, Long-nosed Fur Seal, South Australian Fur Seal",,,LC,,3.1,2015,increasing,N
my code:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int main() {
string line;
vector<string> rawdata;
ifstream file ( "/Users/darla/Desktop/Programs/seals.csv" );
if ( file.good() )
{
while(getline(file, line, '"')) {
stringstream ss(line);
while (getline(ss, line, ',')) {
rawdata.push_back(line);
}
if (getline(file, line, '"')) {
rawdata.push_back(line);
}
}
}
cout << rawdata[22] << endl;
return 0;
This is far from a complete CSV parser and could be made more efficient, but it does the job, parses your file correctly and deals with double quotes as well.
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
int main()
{
std::string line;
std::vector<std::vector<std::string>> lines;
std::ifstream file("/Users/darla/Desktop/Programs/seals.csv");
if (file)
{
while (std::getline(file, line))
{
size_t n = lines.size();
lines.resize(n + 1);
std::istringstream ss(line);
std::string field, push_field("");
bool no_quotes = true;
while (std::getline(ss, field, ','))
{
if (static_cast<size_t>(std::count(field.begin(), field.end(), '"')) % 2 != 0)
{
no_quotes = !no_quotes;
}
push_field += field + (no_quotes ? "" : ",");
if (no_quotes)
{
lines[n].push_back(push_field);
push_field.clear();
}
}
}
}
for (auto line : lines)
{
for (auto field : line)
{
std::cout << "| " << field << " |";
}
std::cout << std::endl << std::endl;
}
return 0;
}
An explanation. The program reads file lines and tries to parse each line by fields, separated by commas, then stores the results in vector of vectors. If a field with double quotes encountered and double quotes are at odd number, this means it is an open field so more fields are read in until closing field is found, then the complete filed is stored. If field contains even number of double quotes or none, it is stored straight away. Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With