Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snort/PCRE Regex: odd character class syntax

Tags:

regex

pcre

snort

While I was parsing the Snort regex set I found a very odd character class syntax, like [\x80-t] or [\x01-t\x0B\x0C\x0E-t\x80-t], and I can't figure out (really no clue) what -t means. I don't even know if it's standard PCRE or a sort of Snort extension.

Here are some regular expression that contains these character classes:

/\x3d\x00\x12\x00..........(.[\x80-t]|...[\x80-t])/smiR
/^To\x3A[^\r\n]+[\x01-t\x0B\x0C\x0E-t\x80-t]/smi

PS: please note that \x80-t is not even a valid range in the standard way because character t is \x74.

like image 319
Simone-Cu Avatar asked Dec 12 '13 14:12

Simone-Cu


2 Answers

This could reference a different character encoding where t is larger than x80 and x80 can't be addressed normally.

Take EBCDIC Scan codes for example (see here for a reference).

(But I too have no clue why somebody would want to write it that way)

For ASCII I have a wild guess: If -t means "until the next token -1" or if placed last in line "until the end of allowed characters" the second query would state this:

To:(not a newline, more than one character)(not a newline)

So basically the expression [\x01-t\x0B\x0C\x0E-t\x80-t] would mean [^\r\n].

If one applies that to (.Ç-t]|...[Ç-t]) that would address any character larger than 7bit ASCII which also could address all of unicode (besides the first 127 characters).

(That being said, I still have no clue why somebody should write it like this, but at least thats a coherent explanation besides "Its a bug")

Maybe helpful: What does the rexexes you posted mean if one writes out the \xYY? ASCII:

/=\NULL\DEVICE_CONTROL_2\NULL\.{10}\(.Ç-t]|...[Ç-t])/smiR
/^To\:[^\r\n]+[\START_OF_HEADING-t\VERTICALTAB\FORMFEED\SHIFTOUT\Ç-t]/smi

Looking after the \0x12 aka Device control 2 could help, because that won't show up in text, but maybe in net traffic.

like image 76
Angelo Fuchs Avatar answered Sep 30 '22 21:09

Angelo Fuchs


The second regex matches lines that begin with To: (case-insensitive) followed by at least one character that isn't a line feed or carriage return. Since this is a greedy match, I'd expect \r or \n to be the only possible terminating matches in the [\x01-t\x0B\x0C\x0E-t\x80-t] character class. Note: \r is equivalent to \x0D and \n is equivalent to \x0A. Not sure what -t means but let's pretend it was - instead. Then the character class would be [\x01-\x0B\x0C\x0E-\x80-], which is still a bit convoluted but would make a little bit more sense - i.e. allowing a \n as a terminating character but not \r.

This is a very long shot but is there any chance this could be some kind of search-and-replace gone wrong?! (Guess this can probably be quickly discounted if there are other regexes that have normal ranges without the t.)

like image 33
Steve Chambers Avatar answered Sep 30 '22 22:09

Steve Chambers