First time I tried using regular expressions in C++, and I'm a little confused about escape sequences. I'm simply trying to match a dot at the beginning of a string. For that I'm using the expression: "^\\\.", which works, but my compiler (g++) generates a warning:
warning: unknown escape sequence '\.'
regex self_regex("^\\\.");
^~
If I'm using e.g "^\\.", it does not generate a warning, but that regex does not match what I intend to do.
I also don't understand why I have to use three backslashes, shouldn't two be sufficient, in "\." the first backslash escapes the second one, so that I actually search for ., but it doesn't work. Can someone please clarify this for me?
Code:
#include <iostream>
#include <dirent.h>
#include <regex>
using namespace std;
int main(void){
DIR *dir;
string path = "/Users/-----------/Documents/Bibliothek/MachineLearning/DeepLearning/ConvolutionalNeuralNetworks/CS231n 2016/Assignments/assignment3/assignment3/cs231n";
regex self_regex("^\\\.+");
struct dirent *ent;
dir = opendir(path.c_str());
if ((dir = opendir(path.c_str())) != NULL){
while ((ent = readdir(dir)) != NULL){
if (regex_search(string(ent->d_name),self_regex)){
cout << "matches regex" << ent->d_name << endl;
}
else{
cout << "does not match regex " << ent->d_name << endl;
}
}
closedir(dir);
}
return 0;
}
Output:
matches regex.
matches regex..
matches regex.DS_Store
matches regex.gitignore
does not match regex __init__.py
does not match regex __init__.pyc
does not match regex build
does not match regex captioning_solver.py
does not match regex captioning_solver.pyc
does not match regex classifiers
does not match regex coco_utils.py
does not match regex coco_utils.pyc
does not match regex data_utils.py
does not match regex datasets
does not match regex fast_layers.py
does not match regex fast_layers.pyc
does not match regex gradient_check.py
does not match regex gradient_check.pyc
does not match regex im2col.py
does not match regex im2col.pyc
does not match regex im2col_cython.c
does not match regex im2col_cython.pyx
does not match regex im2col_cython.so
does not match regex image_utils.py
does not match regex image_utils.pyc
does not match regex layer_utils.py
does not match regex layers.py
does not match regex layers.pyc
does not match regex optim.py
does not match regex optim.pyc
does not match regex rnn_layers.py
does not match regex rnn_layers.pyc
does not match regex setup.py
The "\\" is transformed into "\" , but your compiler doesn't know how to handle "\." because there is no such escape sequence defined in C++. Escape sequences in which the character following the backslash is not listed (...) are conditionally-supported, with implementation-defined semantics.
Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called "escape sequences." To represent a newline character, single quotation mark, or certain other characters in a character constant, you must use escape sequences.
When you write in your code a string literal:
"^\\\."
your compiler will parse it according to the C++ rules to generate the string that will be used in your executable. For example if \n
would be encountered the string in your executable would contain a newline instead. The "\\"
is transformed into "\"
, but your compiler doesn't know how to handle "\."
because there is no such escape sequence defined in C++.
Escape sequences in which the character following the backslash is not listed (...) are conditionally-supported, with implementation-defined semantics.
So the string you're looking for is with only two slashes:
"^\\."
which will be transformed by the compiler into:
"^\."
And this is the regex you're looking for !
Remark: GCC for example will transform an unknown escape sequence "\."
into "."
, so that 2 or 3 bakslashes will in reality produce the same result.
Online demo
The compiler generates a warning because not every escape sequence has a meaning in C++. The list of valid escape sequences can be found here.
However, regex expects you to escape '.' in order to literally match a '.' character instead of anything. To escape '.' in a regex pattern, you must add a single '\' character before it. But since a single '\' means an escape in c++, you need to put two backslashes: "\\". Therefore, the correct pattern is "^\\.".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With