I have a C-string which contains a list of floating numbers separated by commas and spaces. Each pair of numbers is separated by one (or more) spaces and represents a point where the x and y fields are separated by a comma (and optionally by spaces).
" 10,9 2.5, 3 4 ,150.32 "
I need to parse this string in order to fill a list of Point(x, y)
.
Following is my current implementation:
const char* strPoints = getString();
std::istringstream sstream(strPoints);
float x, y;
char comma;
while (sstream >> x >> comma >> y)
{
myList.push(Point(x, y));
}
Since I need to parse a lot (up to 500,000) of these strings I'm wondering if there is a faster solution.
Look at Boost Spirit:
It supports NaN, positive and negative infinity just fine. Also it allows you to express the constraining grammar succinctly.
Here is the adapted sample for your grammar:
struct Point { float x,y; };
typedef std::vector<Point> data_t;
// And later:
bool ok = phrase_parse(f,l,*(double_ > ',' > double_), space, data);
The iterators can be any iterators. So you can hook it up with your C string just fine.
Here's a straight adaptation of the linked benchmark case. This shows you how to parse from any std::istream
or directly from a memory mapped file.
Live On Coliru
Here's a version that doesn't need to know the length of the string up front (this is neat because it avoids the strlen
call in case you didn't have the length available):
template <typename OI>
static inline void parse_points(OI out, char const* it, char const* last = std::numeric_limits<char const*>::max()) {
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
bool ok = qi::phrase_parse(it, last,
*(qi::double_ >> ',' >> qi::double_) [ *phx::ref(out) = phx::construct<Point>(qi::_1, qi::_2) ],
qi::space);
if (!ok || !(it == last || *it == '\0')) {
throw it; // TODO proper error reporting?
}
}
Note how I made it take an output iterator so that you get to decide how to accumulate the results. The obvious wrapper to /just/ parse to a vector would be:
static inline data_t parse_points(char const* szInput) {
data_t pts;
parse_points(back_inserter(pts), szInput);
return pts;
}
But you can also do different things (like append to an existing container, that could have reserved a known capacity up front etc.). Things like this often allow truly optimized integration in the end.
Here's that code fully demo-ed in ~30 lines of essential code:
Live On Coliru
To show off the flexibility of this parser; if you just wanted to check the input and get a count of the points, you can replace the output iterator with a simple lambda function that increments a counter instead of adds a newly constructed point.
int main() {
int count = 0;
parse_points( " 10,9 2.5, 3 4 ,150.32 ", boost::make_function_output_iterator([&](Point const&){count++;}));
std::cout << "elements in sample: " << count << "\n";
}
Live On Coliru
Since everything is inlined the compiler will notice that the whole Point
doesn't need to be constructed here and eliminate that code: http://paste.ubuntu.com/9781055/
The main function is seen directly invoking the very parser primitives. Handcoding the parser won't get you better tuning here, at least not without a lot of effort.
I got much better performance parsing out the points using a combination of std::find and std::strtof and the code wasn't much more complicated. Here's the test I ran:
#include <iostream>
#include <sstream>
#include <random>
#include <chrono>
#include <cctype>
#include <algorithm>
#include <cstdlib>
#include <forward_list>
struct Point { float x; float y; };
using PointList = std::forward_list<Point>;
using Clock = std::chrono::steady_clock;
using std::chrono::milliseconds;
std::string generate_points(int n) {
static auto random_generator = std::mt19937{std::random_device{}()};
std::ostringstream oss;
std::uniform_real_distribution<float> distribution(-1, 1);
for (int i=0; i<n; ++i) {
oss << distribution(random_generator) << " ," << distribution(random_generator) << "\t \n";
}
return oss.str();
}
PointList parse_points1(const char* s) {
std::istringstream iss(s);
PointList points;
float x, y;
char comma;
while (iss >> x >> comma >> y)
points.push_front(Point{x, y});
return points;
}
inline
std::tuple<Point, const char*> parse_point2(const char* x_first, const char* last) {
auto is_whitespace = [](char c) { return std::isspace(c); };
auto x_last = std::find(x_first, last, ',');
auto y_first = std::find_if_not(std::next(x_last), last, is_whitespace);
auto y_last = std::find_if(y_first, last, is_whitespace);
auto x = std::strtof(x_first, (char**)&x_last);
auto y = std::strtof(y_first, (char**)&y_last);
auto next_x_first = std::find_if_not(y_last, last, is_whitespace);
return std::make_tuple(Point{x, y}, next_x_first);
}
PointList parse_points2(const char* i, const char* last) {
PointList points;
Point point;
while (i != last) {
std::tie(point, i) = parse_point2(i, last);
points.push_front(point);
}
return points;
}
int main() {
auto s = generate_points(500000);
auto time0 = Clock::now();
auto points1 = parse_points1(s.c_str());
auto time1 = Clock::now();
auto points2 = parse_points2(s.data(), s.data() + s.size());
auto time2 = Clock::now();
std::cout << "using stringstream: "
<< std::chrono::duration_cast<milliseconds>(time1 - time0).count() << '\n';
std::cout << "using strtof: "
<< std::chrono::duration_cast<milliseconds>(time2 - time1).count() << '\n';
return 0;
}
outputs:
using stringstream: 1262
using strtof: 120
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With