Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ regex, unknown escape sequence '\.' warning

Tags:

c++

regex

First time I tried using regular expressions in C++, and I'm a little confused about escape sequences. I'm simply trying to match a dot at the beginning of a string. For that I'm using the expression: "^\\\.", which works, but my compiler (g++) generates a warning:

warning: unknown escape sequence '\.'
        regex self_regex("^\\\.");
                             ^~

If I'm using e.g "^\\.", it does not generate a warning, but that regex does not match what I intend to do.

I also don't understand why I have to use three backslashes, shouldn't two be sufficient, in "\." the first backslash escapes the second one, so that I actually search for ., but it doesn't work. Can someone please clarify this for me?

Code:

#include <iostream>
#include <dirent.h>
#include <regex>

using namespace std;

int main(void){
    DIR *dir;
    string path = "/Users/-----------/Documents/Bibliothek/MachineLearning/DeepLearning/ConvolutionalNeuralNetworks/CS231n 2016/Assignments/assignment3/assignment3/cs231n";
    regex self_regex("^\\\.+");
    struct dirent *ent;
    dir = opendir(path.c_str());
    if ((dir = opendir(path.c_str())) != NULL){
        while ((ent = readdir(dir)) != NULL){
            if (regex_search(string(ent->d_name),self_regex)){
                cout << "matches regex" << ent->d_name << endl;
            }
            else{
                cout << "does not match regex " << ent->d_name << endl;
            }
        }
        closedir(dir);
    }
    return 0;
}

Output:

matches regex.
matches regex..
matches regex.DS_Store
matches regex.gitignore
does not match regex __init__.py
does not match regex __init__.pyc
does not match regex build
does not match regex captioning_solver.py
does not match regex captioning_solver.pyc
does not match regex classifiers
does not match regex coco_utils.py
does not match regex coco_utils.pyc
does not match regex data_utils.py
does not match regex datasets
does not match regex fast_layers.py
does not match regex fast_layers.pyc
does not match regex gradient_check.py
does not match regex gradient_check.pyc
does not match regex im2col.py
does not match regex im2col.pyc
does not match regex im2col_cython.c
does not match regex im2col_cython.pyx
does not match regex im2col_cython.so
does not match regex image_utils.py
does not match regex image_utils.pyc
does not match regex layer_utils.py
does not match regex layers.py
does not match regex layers.pyc
does not match regex optim.py
does not match regex optim.pyc
does not match regex rnn_layers.py
does not match regex rnn_layers.pyc
does not match regex setup.py
like image 835
eager2learn Avatar asked Aug 02 '16 10:08

eager2learn


People also ask

What is unknown escape sequence in C++?

The "\\" is transformed into "\" , but your compiler doesn't know how to handle "\." because there is no such escape sequence defined in C++. Escape sequences in which the character following the backslash is not listed (...) are conditionally-supported, with implementation-defined semantics.

Is escape a sequence?

Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called "escape sequences." To represent a newline character, single quotation mark, or certain other characters in a character constant, you must use escape sequences.


2 Answers

When you write in your code a string literal:

"^\\\."  

your compiler will parse it according to the C++ rules to generate the string that will be used in your executable. For example if \n would be encountered the string in your executable would contain a newline instead. The "\\" is transformed into "\", but your compiler doesn't know how to handle "\." because there is no such escape sequence defined in C++.

Escape sequences in which the character following the backslash is not listed (...) are conditionally-supported, with implementation-defined semantics.

So the string you're looking for is with only two slashes:

"^\\."

which will be transformed by the compiler into:

"^\."  

And this is the regex you're looking for !

Remark: GCC for example will transform an unknown escape sequence "\." into ".", so that 2 or 3 bakslashes will in reality produce the same result.

Online demo

like image 187
Christophe Avatar answered Oct 19 '22 18:10

Christophe


The compiler generates a warning because not every escape sequence has a meaning in C++. The list of valid escape sequences can be found here.

However, regex expects you to escape '.' in order to literally match a '.' character instead of anything. To escape '.' in a regex pattern, you must add a single '\' character before it. But since a single '\' means an escape in c++, you need to put two backslashes: "\\". Therefore, the correct pattern is "^\\.".

like image 25
Fatih BAKIR Avatar answered Oct 19 '22 18:10

Fatih BAKIR