I have data in the following format:
4:How do you do? 10:Happy birthday 1:Purple monkey dishwasher 200:The Ancestral Territorial Imperatives of the Trumpeter Swan
The number can be anywhere from 1 to 999, and the string is at most 255 characters long. I'm new to C++ and it seems a few sources recommend extracting formatted data with a stream's >>
operator, but when I want to extract a string it stops at the first whitespace character. Is there a way to configure a stream to stop parsing a string only at a newline or end-of-file? I saw that there was a getline
method to extract an entire line, but then I still have to split it up manually [with find_first_of
], don't I?
Is there an easy way to parse data in this format using only STL?
You've already been told about std::getline
, but they didn't mention one detail that you'll probably find useful: when you call getline
, you can also pass a parameter telling it what character to treat as the end of input. To read your number, you can use:
std::string number;
std::string name;
std::getline(infile, number, ':');
std::getline(infile, name);
This will put the data up to the ':' into number
, discard the ':', and read the rest of the line into name
.
If you want to use >>
to read the data, you can do that too, but it's a bit more difficult, and delves into an area of the standard library that most people never touch. A stream has an associated locale
that's used for things like formatting numbers and (importantly) determining what constitutes "white space". You can define your own locale to define the ":" as white space, and the space (" ") as not white space. Tell the stream to use that locale, and it'll let you read your data directly.
#include <locale>
#include <vector>
struct colonsep: std::ctype<char> {
colonsep(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::mask());
rc[':'] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
Now to use it, we "imbue" the stream with a locale:
#include <fstream>
#include <iterator>
#include <algorithm>
#include <iostream>
typedef std::pair<int, std::string> data;
namespace std {
std::istream &operator>>(std::istream &is, data &d) {
return is >> d.first >> d.second;
}
std::ostream &operator<<(std::ostream &os, data const &d) {
return os << d.first << ":" << d.second;
}
}
int main() {
std::ifstream infile("testfile.txt");
infile.imbue(std::locale(std::locale(), new colonsep));
std::vector<data> d;
std::copy(std::istream_iterator<data>(infile),
std::istream_iterator<data>(),
std::back_inserter(d));
// just for fun, sort the data to show we can manipulate it:
std::sort(d.begin(), d.end());
std::copy(d.begin(), d.end(), std::ostream_iterator<data>(std::cout, "\n"));
return 0;
}
Now you know why that part of the library is so neglected. In theory, getting the standard library to do your work for you is great -- but in fact, most of the time it's easier to do this kind of job on your own instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With