Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse a C-string of floating numbers

I have a C-string which contains a list of floating numbers separated by commas and spaces. Each pair of numbers is separated by one (or more) spaces and represents a point where the x and y fields are separated by a comma (and optionally by spaces).

" 10,9 2.5, 3   4 ,150.32 "

I need to parse this string in order to fill a list of Point(x, y).
Following is my current implementation:

const char* strPoints = getString();
std::istringstream sstream(strPoints);

float x, y;
char comma;

while (sstream >> x >> comma >> y)
{
   myList.push(Point(x, y));
}

Since I need to parse a lot (up to 500,000) of these strings I'm wondering if there is a faster solution.

like image 271
Nick Avatar asked Jan 16 '15 19:01

Nick


Video Answer


2 Answers

Look at Boost Spirit:

  • How to parse space-separated floats in C++ quickly?

It supports NaN, positive and negative infinity just fine. Also it allows you to express the constraining grammar succinctly.

  1. Simple adaptation of the code

    Here is the adapted sample for your grammar:

    struct Point { float x,y; };
    typedef std::vector<Point> data_t;
    
    // And later:
    bool ok = phrase_parse(f,l,*(double_ > ',' > double_), space, data);
    

    The iterators can be any iterators. So you can hook it up with your C string just fine.

    Here's a straight adaptation of the linked benchmark case. This shows you how to parse from any std::istream or directly from a memory mapped file.

    Live On Coliru


  2. Further optimizations (strictly for C strings)

    Here's a version that doesn't need to know the length of the string up front (this is neat because it avoids the strlen call in case you didn't have the length available):

    template <typename OI>
    static inline void parse_points(OI out, char const* it, char const* last = std::numeric_limits<char const*>::max()) {
        namespace qi  = boost::spirit::qi;
        namespace phx = boost::phoenix;
    
        bool ok = qi::phrase_parse(it, last,
                *(qi::double_ >> ',' >> qi::double_) [ *phx::ref(out) = phx::construct<Point>(qi::_1, qi::_2) ],
                qi::space);
    
        if (!ok || !(it == last || *it == '\0')) {
            throw it; // TODO proper error reporting?
        }
    }
    

    Note how I made it take an output iterator so that you get to decide how to accumulate the results. The obvious wrapper to /just/ parse to a vector would be:

    static inline data_t parse_points(char const* szInput) {
        data_t pts;
        parse_points(back_inserter(pts), szInput);
        return pts;
    }
    

    But you can also do different things (like append to an existing container, that could have reserved a known capacity up front etc.). Things like this often allow truly optimized integration in the end.

    Here's that code fully demo-ed in ~30 lines of essential code:

    Live On Coliru

  3. Extra Awesome Bonus

    To show off the flexibility of this parser; if you just wanted to check the input and get a count of the points, you can replace the output iterator with a simple lambda function that increments a counter instead of adds a newly constructed point.

    int main() {
        int count = 0;
        parse_points( " 10,9 2.5, 3   4 ,150.32    ", boost::make_function_output_iterator([&](Point const&){count++;}));
        std::cout << "elements in sample: " << count << "\n";
    }
    

    Live On Coliru

    Since everything is inlined the compiler will notice that the whole Point doesn't need to be constructed here and eliminate that code: http://paste.ubuntu.com/9781055/

    The main function is seen directly invoking the very parser primitives. Handcoding the parser won't get you better tuning here, at least not without a lot of effort.

like image 128
sehe Avatar answered Oct 27 '22 02:10

sehe


I got much better performance parsing out the points using a combination of std::find and std::strtof and the code wasn't much more complicated. Here's the test I ran:

#include <iostream>                                                                             
#include <sstream>                                                                              
#include <random>                                                                               
#include <chrono>                                                                               
#include <cctype>                                                                               
#include <algorithm>                                                                            
#include <cstdlib>                                                                              
#include <forward_list>                                                                         

struct Point { float x; float y; };                                                             
using PointList = std::forward_list<Point>;                                                     
using Clock = std::chrono::steady_clock;                                                        
using std::chrono::milliseconds;                                                                

std::string generate_points(int n) {                                                            
  static auto random_generator = std::mt19937{std::random_device{}()};                          
  std::ostringstream oss;                                                                       
  std::uniform_real_distribution<float> distribution(-1, 1);                                    
  for (int i=0; i<n; ++i) {                                                                     
    oss << distribution(random_generator) << " ," << distribution(random_generator) << "\t \n"; 
  }                                                                                             
  return oss.str();                                                                             
}                                                                                               

PointList parse_points1(const char* s) {                                                        
  std::istringstream iss(s);                                                                    
  PointList points;                                                                             
  float x, y;                                                                                   
  char comma;                                                                                   
  while (iss >> x >> comma >> y)                                                                
       points.push_front(Point{x, y});                                                          
  return points;                                                                                
}                                                                                               

inline                                                                                          
std::tuple<Point, const char*> parse_point2(const char* x_first, const char* last) {            
  auto is_whitespace = [](char c) { return std::isspace(c); };                                  
  auto x_last  = std::find(x_first, last, ',');                                                 
  auto y_first = std::find_if_not(std::next(x_last), last, is_whitespace);                      
  auto y_last  = std::find_if(y_first, last, is_whitespace);                                    
  auto x = std::strtof(x_first, (char**)&x_last);                                               
  auto y = std::strtof(y_first, (char**)&y_last);                                               
  auto next_x_first = std::find_if_not(y_last, last, is_whitespace);                            
  return std::make_tuple(Point{x, y}, next_x_first);                                            
}                                                                                               

PointList parse_points2(const char* i, const char* last) {                                      
  PointList points;                                                                             
  Point point;                                                                                  
  while (i != last) {                                                                           
    std::tie(point, i) = parse_point2(i, last);                                                 
    points.push_front(point);                                                                   
  }                                                                                             
  return points;                                                                                
}                                                                                               

int main() {                                                                                    
  auto s = generate_points(500000);                                                             
  auto time0 = Clock::now();                                                                    
  auto points1 = parse_points1(s.c_str());                                                      
  auto time1 = Clock::now();                                                                    
  auto points2 = parse_points2(s.data(), s.data() + s.size());                                  
  auto time2 = Clock::now();                                                                    
  std::cout << "using stringstream: "                                                           
            << std::chrono::duration_cast<milliseconds>(time1 - time0).count() << '\n';         
  std::cout << "using strtof: "                                                                 
            << std::chrono::duration_cast<milliseconds>(time2 - time1).count() << '\n';         
  return 0;                                                                                     
}                                  

outputs:

using stringstream: 1262
using strtof: 120
like image 20
Ryan Burn Avatar answered Oct 27 '22 01:10

Ryan Burn