Sorry for asking a question that many may think has already been asked.
I have a very long CSV data file (dat.csv) with 5 columns. I have another short CSV (filter.csv) file with 1 column.
Now, I only need to extract columns from dat.csv where column-1 matches with that of column-1 of filter.csv.
I would usually do this in BASH using sed/awk
. However, for some other reasons I need to do this within a C++ file. Can you suggest an efficient way to do this?
Sample Data:
data.csv
ID,Name,CountryCode,District,Population 3793,NewYork,USA,NewYork,8008278 3794,LosAngeles,USA,California,3694820 3795,Chicago,USA,Illinois,2896016 3796,Houston,USA,Texas,1953631 3797,Philadelphia,USA,Pennsylvania,1517550 3798,Phoenix,USA ,Arizona,1321045 3799,SanDiego,USA,California,1223400 3800,Dallas,USA,Texas,1188580 3801,SanAntonio,USA,Texas,1144646
filter.csv
3793 3797 3798
This .csv sorting library might help:
http://www.partow.net/programming/dsvfilter/index.html
You could merge the columns of both tables into one larger table, and then query for matches in the new table (where column 1 of table A is and column 1 of table B is). Or maybe that library has functions for comparing tables.
Here are some tips:
The stream from which you're reading the data needs to ignore the commas, so what it should to is set comma characters to whitespace using the std::ctype<char>
facet imbued in its locale. Here's an example of modifying the classification table:
struct ctype : std::ctype<char>
{
private:
static mask* get_table()
{
static std::vector<mask> v(classic_table(),
classic_table() + table_size);
v[','] &= ~space;
return &v[0];
}
public:
ctype() : std::ctype<char>(get_table()) { }
};
Read the first csv. file line-wise (meaning std::getline()
). Extract the first word and compare it with an extraction from the second .csv file. Continue this until you reach the end of the first file:
int main()
{
std::ifstream in1("test1.csv");
std::ifstream in2("test2.csv");
typedef std::istream_iterator<std::string> It;
in1 >> comma_whitespace;
in2 >> comma_whitespace;
std::vector<std::string> in2_content(It(in2), It());
std::vector<std::string> matches;
while (std::getline(in1, line))
{
std::istringstream iss(line);
It beg(iss);
if (std::find(in2_content.begin(),
in2_content.end(), *beg) != in2_content.end())
{
matches.push_back(line);
}
}
}
// After the above, the vector matches should hold all the rows that
// have the same ID number as in the second csv file
comma_whitespace
is a manipulator which changes the locale to the custom ctype
defined above.
Disclaimer: I haven't tested this code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With