// regex_replace example
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
int main ()
{
std::string INPUT = "Replace_All_Characters_With_Anything";
std::string OUTEXP = "0";
std::regex expression("[A-Za-z]", std::regex_constants::icase);
std::cout << std::regex_replace(INPUT, expression, OUTEXP);
return 0;
}
This works here: http://cpp.sh/6gb5a This works here: https://regexr.com/5bt9d
The problem seems to be down to using icase flag or not. A in All, the C in Characters, the W in With, etc. does not get replaced because of the underscore existing. The bug seems to be that using []
to match things only works if said character does not come after a non match.
There does seem to be a quick fix for this, if brackets are followed by a {1}, then it works.
example: [A-Za-z]{1}
Compiler: Microsoft Visual Studio Community 2019 / Version 16.7.3 / c++17
Also tested in c++14, same bad behavior
expected result:
my result:
How do you use square brackets in regex? Use square brackets ( [] ) to create a matching list that will match on any one of the characters in the list. Virtually all regular expression metacharacters lose their special meaning and are treated as regular characters when used within square brackets.
Square brackets ( “[ ]” ): Any expression within square brackets [ ] is a character set; if any one of the characters matches the search string, the regex will pass the test return true.
The regular expression [A-Z][a-z]* matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters.
Not sure if this is an appropriate use of answering. But this is a known bug and it looks like the bug has been known for a few months. No ETA on a fix as far as I can see.
https://github.com/microsoft/STL/issues/993
Looks like RE2 is a recommended alternative regex library.
https://github.com/google/re2/
Instead of using another library, I will create a function that can be used to intercept and change the regex expression string as a temporary fix. Should work whether or not icase flag is used.
test code: https://rextester.com/LSNW3495
// add '{1}' after square bracket ranges unless there already is a quantifier or alternation such as '?' '*' '+' '{}'
std::string temporaryBugFix(std::string exp)
{
enum State
{
start,
skipNext,
lookForEndBracket,
foundEndBracket,
};
State state = start;
State prevState = start;
int p = -1;
std::vector<int> positionsToFix;
for (auto c : exp)
{
++p;
switch (state)
{
case start:
if (c == '\\')
{
prevState = state;
state = skipNext;
}
else if (c == '[')
state = lookForEndBracket;
continue;
case skipNext:
state = prevState;
continue;
case lookForEndBracket:
if (c == '\\')
{
prevState = state;
state = skipNext;
}
else if (c == ']')
{
state = foundEndBracket;
if (p + 1 == exp.length())
positionsToFix.push_back(p + 1);
}
continue;
case foundEndBracket:
if (c != '+' && c != '*' && c != '?')
positionsToFix.push_back(p);
state = start;
continue;
}
}
// check for valid curly brackets so we don't add an additional one
std::string s = exp;
std::smatch m;
std::regex e("\\{\\d+,?\\d*?\\}");
int offset = 0;
vector<int> validCurlyBracketPositions;
while (regex_search(s, m, e))
{
validCurlyBracketPositions.push_back(m.position(0) + offset);
offset += m.position(0) + m[0].length();
s = m.suffix();
}
// remove valid curly bracket positions from the fix vector
for (auto p : validCurlyBracketPositions)
positionsToFix.erase(std::remove(positionsToFix.begin(), positionsToFix.end(), p), positionsToFix.end());
// insert the fixes
for (int i = positionsToFix.size(); i--; )
exp.insert(positionsToFix[i], "{1}");
return exp;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With