Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I detect string literals in code?

Tags:

c++

regex

c++11

I want to write string detecting function for my obfuscator, I've stuck at debugging it, I can write pattern for strings like cout<<"Hello world" or cout<<"2+2=4"

but not for

cout<<"2+2"<<"Trolll";
cout<<"asd \" trololo";

simply I just want to extract things which are between " and ", actually I tried

["][\x20-\x74]*["]

but for e.g.

cout<<"asdfg"<<"asdsfgh";

it gives me "asdfg"<<"asdfgh", not "asdfg".

Any ideas how to build the expression for string extraction?

like image 447
encoree1337 Avatar asked Jun 04 '26 15:06

encoree1337


1 Answers

Regular expressions, by default, are greedy. This means that they try to match as much as possible. There are several ways of preventing this. The easiest is to just make them non-greedy. You can make the quantifier * non-greedy by appending ?:

"[\x20-\x74]*?"

(Incidentally, there’s no need for the […] around the quotes.)

However, it’s helpful to be explicit and precise in descriptions. One reason for this is that the above expression is still buggy. For instance, it doesn’t match "\"" correctly.

A string literal in C++ is quite well-defined, and your definition simply doesn’t match it. The actual definition (§2.14.3 of the standard) is (simplified): a char-sequence surrounded by ", where a char-sequence is a sequence of zero or more characters except ", \ and newline, or an escape-sequence.

An escape-sequence`, in turn, is defined as either simple, octal or hexadecimal. Taken together, this leaves us with (again, slightly simplified):

"([^"\\]|\\(['"?\\abfnrtv]|[0-7]+|x[0-9a-fA-F]+))*"

– no need for the non-greedy specifier now, since we explicitly exclude " from matching earlier, unless escaped.

like image 155
Konrad Rudolph Avatar answered Jun 07 '26 12:06

Konrad Rudolph



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!