I have to process a comma separated string which contains triplets of values and translate them to runtime types,the input looks like: <pre class="prettyprint"><code>"1x2y3z,80r160g255b,48h30m50s,1x3z,255b,1h,..." </code></pre> So each substring should be transformed this way: <pre class="prettyprint"><code>"1x2y3z" should become Vector3 with x = 1, y = 2, z = 3 "80r160g255b" should become Color with r = 80, g = 160, b = 255 "48h30m50s" should become Time with h = 48, m = 30, s = 50 </code></pre> The problem I'm facing is that all the components are optional (but they preserve order) so the following strings are also valid <code>Vector3</code>, <code>Color</code> and <code>Time</code> values: <pre class="prettyprint"><code>"1x3z" Vector3 x = 1, y = 0, z = 3 "255b" Color r = 0, g = 0, b = 255 "1h" Time h = 1, m = 0, s = 0 </code></pre> What I have tried so far? <h3>All components optional</h3> <pre class="prettyprint"><code>((?:\d+A)?(?:\d+B)?(?:\d+C)?) </code></pre> The <code>A</code>, <code>B</code> and <code>C</code> are replaced with the correct letter for each case, the expression works almost well but it gives twice the expected results (one match for the string and another match for an empty string just after the first match), for example: <pre class="prettyprint"><code>"1h1m1s" two matches [1]: "1h1m1s" [2]: "" "11x50z" two matches [1]: "11x50z" [2]: "" "11111h" two matches [1]: "11111h" [2]: "" </code></pre> This isn't unexpected... after all an empty string matches the expression when ALL of the components are empty; so in order to fix this issue I've tried the following: <h3>1 to 3 quantifier</h3> <pre class="prettyprint"><code>((?:\d+[ABC]){1,3}) </code></pre> But now, the expression matches strings with wrong ordering or even repeated components!: <pre class="prettyprint"><code>"1s1m1h" one match, should not match at all! (wrong order) "11z50z" one match, should not match at all! (repeated components) "1r1r1b" one match, should not match at all! (repeated components) </code></pre> As for my last attempt, I've tried this variant of my first expression: <h3>Match from begin <code>^</code> to the end <code>$</code> </h3> <pre class="prettyprint"><code>^((?:\d+A)?(?:\d+B)?(?:\d+C)?)$ </code></pre> And it works better than the first version but it still matches the empty string plus I should first tokenize the input and then pass each token to the expression in order to assure that the test string could match the begin (<code>^</code>) and end (<code>$</code>) operators. <h3>EDIT: Lookahead attempt (thanks to Casimir et Hippolyte)</h3> After reading and (try to) understanding the regex lookahead concept and with the help of Casimir et Hippolyte answer I've tried the suggested expression: <pre class="prettyprint"><code>\b(?=[^,])(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b </code></pre> Against the following test string: <pre class="prettyprint"><code>"48h30m50s,1h,1h1m1s,11111h,1s1m1h,1h1h1h,1s,1m,1443s,adfank,12322134445688,48h" </code></pre> And the results were amazing! it is able to detect complete valid matches flawlessly (other expressions gave me 3 matches on <code>"1s1m1h"</code> or <code>"1h1h1h"</code> which weren't intended to be matched at all). Unfortunately it captures emtpy matches everytime a unvalid match is found so a <code>""</code> is detected just before <code>"1s1m1h"</code>, <code>"1h1h1h"</code>, <code>"adfank"</code> and <code>"12322134445688"</code>, so I modified the Lookahead condition to get the expression below: <pre class="prettyprint"><code>\b(?=(?:\d+[ABC]){1,3})(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b </code></pre> It gets rid of the empty matches in any string which doesn't match <code>(?:\d+[ABC]){1,3})</code> so the empty matches just before <code>"adfank"</code> and <code>"12322134445688"</code> are gone but the ones just before <code>"1s1m1h"</code>, <code>"1h1h1h"</code> are stil detected. <hr> So the question is: Is there any regular expression which matches three triplet values in a given order where all component is optional but should be composed of at least one component and doesn't match empty strings? The regex tool I'm using is the C++11 one.

Yes, you can add a lookahead at the begining to ensure there is at least one character: <pre class="prettyprint"><code>^(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)$ </code></pre> If you need to find this kind of substring in a larger string (so without to tokenize before), you can remove the anchors and use a more explicit subpattern in a lookahead: <pre class="prettyprint"><code>(?=\d+[ABC])((?:\d+A)?(?:\d+B)?(?:\d+C)?) </code></pre> In this case, to avoid false positive (since you are looking for very small strings that can be a part of something else), you can add word-boundaries to the pattern: <pre class="prettyprint"><code>\b(?=\d+[ABC])((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b </code></pre> Note: in a comma delimited string: <code>(?=\d+[ABC])</code> can be replaced by <code>(?=[^,])</code>

I think this might do the trick. I am keying on either the beginning of the string to match <code>^</code> or the comma separator <code>,</code> for fix the start of each match: <code>(?:^|,)</code>. Example: <pre class="prettyprint"><code>#include <regex> #include <iostream> const std::regex r(R"~((?:^|,)((?:\d+[xrh])?(?:\d+[ygm])?(?:\d+[zbs])?))~"); int main() { std::string test = "1x2y3z,80r160g255b,48h30m50s,1x3z,255b"; std::sregex_iterator iter(test.begin(), test.end(), r); std::sregex_iterator end_iter; for(; iter != end_iter; ++iter) std::cout << iter->str(1) << '\n'; } </code></pre> Output: <pre class="prettyprint"><code>1x2y3z 80r160g255b 48h30m50s 1x3z 255b </code></pre> Is that what you are after? EDIT: If you really want to go to town and make empty expressions unmatched then as far as I can tell you have to put in every permutation like this: <pre class="prettyprint"><code>const std::string A = "(?:\\d+[xrh])"; const std::string B = "(?:\\d+[ygm])"; const std::string C = "(?:\\d+[zbs])"; const std::regex r("(?:^|,)(" + A + B + C + "|" + A + B + "|" + A + C + "|" + B + C + "|" + A + "|" + B + "|" + C + ")"); </code></pre>

regex with all components optionals, how to avoid empty matches

Tags:

c++

regex

c++11

I have to process a comma separated string which contains triplets of values and translate them to runtime types,the input looks like:

"1x2y3z,80r160g255b,48h30m50s,1x3z,255b,1h,..."

So each substring should be transformed this way:

"1x2y3z"      should become Vector3 with x = 1,  y = 2,   z = 3
"80r160g255b" should become Color   with r = 80, g = 160, b = 255
"48h30m50s"   should become Time    with h = 48, m = 30,  s = 50

The problem I'm facing is that all the components are optional (but they preserve order) so the following strings are also valid Vector3, Color and Time values:

"1x3z" Vector3 x = 1, y = 0, z = 3
"255b" Color   r = 0, g = 0, b = 255
"1h"   Time    h = 1, m = 0, s = 0

What I have tried so far?

All components optional

((?:\d+A)?(?:\d+B)?(?:\d+C)?)

The A, B and C are replaced with the correct letter for each case, the expression works almost well but it gives twice the expected results (one match for the string and another match for an empty string just after the first match), for example:

"1h1m1s" two matches [1]: "1h1m1s" [2]: ""
"11x50z" two matches [1]: "11x50z" [2]: ""
"11111h" two matches [1]: "11111h" [2]: ""

This isn't unexpected... after all an empty string matches the expression when ALL of the components are empty; so in order to fix this issue I've tried the following:

1 to 3 quantifier

((?:\d+[ABC]){1,3})

But now, the expression matches strings with wrong ordering or even repeated components!:

"1s1m1h" one match, should not match at all! (wrong order)
"11z50z" one match, should not match at all! (repeated components)
"1r1r1b" one match, should not match at all! (repeated components)

As for my last attempt, I've tried this variant of my first expression:

Match from begin `^` to the end `$`

^((?:\d+A)?(?:\d+B)?(?:\d+C)?)$

And it works better than the first version but it still matches the empty string plus I should first tokenize the input and then pass each token to the expression in order to assure that the test string could match the begin (^) and end ($) operators.

EDIT: Lookahead attempt (thanks to Casimir et Hippolyte)

After reading and (try to) understanding the regex lookahead concept and with the help of Casimir et Hippolyte answer I've tried the suggested expression:

\b(?=[^,])(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b

Against the following test string:

"48h30m50s,1h,1h1m1s,11111h,1s1m1h,1h1h1h,1s,1m,1443s,adfank,12322134445688,48h"

And the results were amazing! it is able to detect complete valid matches flawlessly (other expressions gave me 3 matches on "1s1m1h" or "1h1h1h" which weren't intended to be matched at all). Unfortunately it captures emtpy matches everytime a unvalid match is found so a "" is detected just before "1s1m1h", "1h1h1h", "adfank" and "12322134445688", so I modified the Lookahead condition to get the expression below:

\b(?=(?:\d+[ABC]){1,3})(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b

It gets rid of the empty matches in any string which doesn't match (?:\d+[ABC]){1,3}) so the empty matches just before "adfank" and "12322134445688" are gone but the ones just before "1s1m1h", "1h1h1h" are stil detected.

So the question is: Is there any regular expression which matches three triplet values in a given order where all component is optional but should be composed of at least one component and doesn't match empty strings?

The regex tool I'm using is the C++11 one.

911

asked May 14 '15 10:05

PaperBirdMaster

2 Answers

Yes, you can add a lookahead at the begining to ensure there is at least one character:

^(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)$

If you need to find this kind of substring in a larger string (so without to tokenize before), you can remove the anchors and use a more explicit subpattern in a lookahead:

(?=\d+[ABC])((?:\d+A)?(?:\d+B)?(?:\d+C)?)

In this case, to avoid false positive (since you are looking for very small strings that can be a part of something else), you can add word-boundaries to the pattern:

\b(?=\d+[ABC])((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b

Note: in a comma delimited string: (?=\d+[ABC]) can be replaced by (?=[^,])

answered Sep 19 '22 02:09

Casimir et Hippolyte

I think this might do the trick.

I am keying on either the beginning of the string to match ^ or the comma separator , for fix the start of each match: (?:^|,).

Example:

#include <regex>
#include <iostream>

const std::regex r(R"~((?:^|,)((?:\d+[xrh])?(?:\d+[ygm])?(?:\d+[zbs])?))~");

int main()
{
    std::string test = "1x2y3z,80r160g255b,48h30m50s,1x3z,255b";

    std::sregex_iterator iter(test.begin(), test.end(), r);
    std::sregex_iterator end_iter;

    for(; iter != end_iter; ++iter)
        std::cout << iter->str(1) << '\n';
}

Output:

1x2y3z
80r160g255b
48h30m50s
1x3z
255b

Is that what you are after?

EDIT:

If you really want to go to town and make empty expressions unmatched then as far as I can tell you have to put in every permutation like this:

const std::string A = "(?:\\d+[xrh])";
const std::string B = "(?:\\d+[ygm])";
const std::string C = "(?:\\d+[zbs])";

const std::regex r("(?:^|,)(" + A + B + C + "|" + A + B + "|" + A + C + "|" + B + C + "|" + A + "|" + B + "|" + C + ")");

answered Sep 20 '22 02:09

Galik

Related questions
                            
                                Best way to atomically update two members of a struct?
                            
                                Why is the Qt Creator Ubuntu Publish Screen Empty?
                            
                                Simple Zlib C++ String Compression and Decompression
                            
                                Can a throw or delete expression ever be dependent?
                            
                                Single producer and multiple single-threaded consumers
                            
                                Will Concepts replace SFINAE?
                            
                                Why is VC++ unable to optimize an integer wrapper?
                            
                                FFMPEG: While decoding video, is possible to generate result to user's provided buffer?
                            
                                Incrementing iterator out of range
                            
                                Apply a Python function to an std::vector via Cython (callback)
                            
                                Is it ok to put "using std::swap;" in a header?
                            
                                Ensuring that current thread holds a lock on a C++11 mutex
                            
                                How to build against the windows 8.1 SDK
                            
                                c++ exceptions throw by value catch by reference
                            
                                Cannot convert parameter 1 from 'const wchar_t *' to 'LPCTSTR' in MFC / C++ project
                            
                                How to forward unique_ptr with tuple?
                            
                                BMP File line padding issue
                            
                                Can I simply add affine or perspective (homography) matrices of transformation?
                            
                                Jsoncpp writing float values incorrectly
                            
                                Not declared variable in default-constructed object in constructor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

regex with all components optionals, how to avoid empty matches

Tags:

c++

regex

c++11

All components optional

1 to 3 quantifier

Match from begin `^` to the end `$`

EDIT: Lookahead attempt (thanks to Casimir et Hippolyte)

PaperBirdMaster

People also ask

2 Answers

Casimir et Hippolyte

Galik

Recent Activity

Donate For Us

regex with all components optionals, how to avoid empty matches

Tags:

c++

regex

c++11

All components optional

1 to 3 quantifier

Match from begin ^ to the end $

EDIT: Lookahead attempt (thanks to Casimir et Hippolyte)

PaperBirdMaster

People also ask

2 Answers

Casimir et Hippolyte

Galik

Related questions

Recent Activity

Donate For Us

Match from begin `^` to the end `$`