In C++11 what is the most performant way to return a reference/pointer to a position in a std::string?

Tags:

I'm building a text parser that uses std::string as the core storage for strings.

I know this is not optimal and that parsers inside compilers use optimzed approaches for this. In my project I don't mind losing some performance in exchange for more clarity and easier maintenance.

At the beginning I read a huge text into memory and then I scan each character to build a ordered set of tokens, its a simple lexer. Currently I'm using std::string to represent the text of a token but I would like to improve this a bit by using a reference/pointer into the original text.

From what I have read it is a bad practice to return and hold to iterators and it is also a bad practice to refer to the std::string internal buffer.

Any suggestions on how to accomplish this in a "clean" way?

406

asked Jul 15 '14 15:07

Pedro Salgueiro

3 Answers

There are proposals to add string_view to C++ in an upcoming standard.

A string_view is a non-owning iterable range over characters with many of the utilities and properties you'd expect of a string class, except you cannot insert/delete characters (and editing characters is often blocked in some subtypes).

I would advise trying that approach -- write your own (in your own utility namespace). (You should have your own utility namespace for reusable code snippets anyhow).

The core data is a pair of char* pr std::string::iterators (or const versions). If the user needs a null terminated buffer, a to_string method allocates one. I would start with non-mutable (const) character data. Do not forget begin and end: that makes your view iterable with for(:) loops.

This design has the danger that the original std::string has to persist long enough to outlast all of the views.

If you are willing to give up some performance for safety, have the view own a std::shared_ptr<const std::string> that it can move a std::string into, and as a first step move the entire buffer into it, and then start chopping/parsing it down. (child views make a new shared pointer to same data). Then your view class is more like a non-mutable string with shared storage.

The upsides to the shared_ptr<const> version include safety, longer lifetime of the views (there is no more lifetime dependency), and you can easily forward your const "substring" type methods to the std::string so you can write less code.

Downsides include possible incompatibility with incoming standard one¹, and lower performance because you are dragging a shared_ptr around.

I suspect views and ranges are going to be increasingly important in modern C++ with the upcoming and recent improvements to the language.

boost::string_ref is apparently an implementation of a proposal to the C++1y standard.

¹ however, given how simple it is to add capabilities in template metaprogramming, having a "resource owner" template argument to a view type might be a good design decision. Then you can have owning and non-owning string_views with otherwise identical semantics...

190

answered Nov 10 '22 00:11

Yakk - Adam Nevraumont

Some through here:

-Internal representation of the string live the same time that the string himself, if you save pointer or iterators to the string to use latter (ex: print reports, postprocessing etc...) to the scope of the string your would face invalid memory access. Normally in this type of processing the text live all the time of the process.
-Iterators is a good choices (for extreme performance and generality I suggest use of const raw pointer const char*, because the origin could be almost anything, string, buffer, mapped buffer, readed data from stream, etc...)
-A good practice is instead of copying the tokens, save a pair (token begin iterator, token end iterator) in a collection of tokens.
-It is imperative for performance trying not to make a lot of allocations (alloc is one of the most expensive operation in any language)

You could check lexertl (for more ideas or for use it): http://www.benhanson.net/lexertl.html and spirit (more complete): http://www.boost.org/doc/libs/release/libs/spirit/

answered Nov 10 '22 01:11

NetVipeC

Returning and using iterators is not a bad practice. Of course assuming that you are not modifying the input buffer, but it does not look like you are.

answered Nov 10 '22 00:11

Wojtek Surowka

Related questions
                            
                                (C++ Threads): Creating worker threads that will be listening to jobs and executing them concurrently when wanted
                            
                                Default object values in functions with default argument values
                            
                                Are there any rules of thumb when `virtual` is a considerable overhead?
                            
                                Fast LAPACK/BLAS for matrix multiplication
                            
                                What's the output of the following code? [duplicate]
                            
                                Why are named constructors static
                            
                                Every n element of an array in C++
                            
                                Retrieving command line arguments in a Qt application
                            
                                Use CreateThread with a lambda
                            
                                Write to stdout using character array (not null terminated) c/c++
                            
                                g++ doesn't compile constexpr function with assert in it
                            
                                How do scanf(), std::cin behave on multithreaded environment?
                            
                                Do I have to delete these pointers?
                            
                                How to read complete data in QTcpSocket?
                            
                                searching substrings in char array [duplicate]
                            
                                Understanding Map in C++ as a Java developer [duplicate]
                            
                                C++ feature, like std::set, which allows duplicates
                            
                                Why memory addresses are even numbers?
                            
                                what happens to the last return *this c++?
                            
                                different class implementations based on template parameter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In C++11 what is the most performant way to return a reference/pointer to a position in a std::string?

Tags:

c++

c++11

stdstring