Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between regex_token_iterator and regex_iterator?

Tags:

c++

regex

Is there any difference between regex_token_iterator and regex_iterator?

It seems they both do same work but not sure which one is better performance?

like image 904
Clax Avatar asked Oct 12 '14 01:10

Clax


1 Answers

There is indeed a difference between, if we look at cppreference it describes std::regex_iterator as follows:

std::regex_iterator is a read-only ForwardIterator that accesses the individual matches of a regular expression within the underlying character sequence.

and std::regex_token_iterator as:

std::regex_token_iterator is a read-only ForwardIterator that accesses the individual sub-matches of every match of a regular expression within the underlying character sequence. It can also be used to access the parts of the sequence that were not matched by the given regular expression (e.g. as a tokenizer).

So std::regex_token_iterator allows you to also match the non-matched tokens or the n-th sub-expression.

The cppreference section for std::regex_token_iterator that I linked above describes a typical implementation as follows:

A typical implementation of std::regex_token_iterator holds the underlying std::regex_iterator, a container (e.g. std::vector) of the requested submatch indexes, the internal counter equal to the index of the submatch, a pointer to std::sub_match, pointing at the current submatch of the current match, and a std::match_results object containing the last non-matched character sequence (used in tokenizer mode).

The book The C++ Standard Library explains in 14.4 Regex Token Iterators as follows:

A regex iterator helps to iterate over matched subsequences. However, sometimes you also want to process all the contents between matched expressions. [...] In addition, you can specify a list of integral values, which represent elements of a “tokenization”:

  • -1 means that you are interested in all the subsequences between matched regular expressions (token separators).
  • 0 means that you are interested in all the matched regular expressions (token separators).
  • Any other value n means that you are interested in the matched nth subexpression inside the regular expressions.

The books site provides example code for sregex_token_iterator and sregex_iterator which should also be helpful.

like image 150
Shafik Yaghmour Avatar answered Oct 05 '22 11:10

Shafik Yaghmour