While I was parsing the Snort regex set I found a very odd character class syntax, like [\x80-t]
or [\x01-t\x0B\x0C\x0E-t\x80-t]
, and I can't figure out (really no clue) what -t means. I don't even know if it's standard PCRE or a sort of Snort extension.
Here are some regular expression that contains these character classes:
/\x3d\x00\x12\x00..........(.[\x80-t]|...[\x80-t])/smiR
/^To\x3A[^\r\n]+[\x01-t\x0B\x0C\x0E-t\x80-t]/smi
PS: please note that \x80-t
is not even a valid range in the standard way because character t is \x74.
This could reference a different character encoding where t
is larger than x80
and x80
can't be addressed normally.
Take EBCDIC Scan codes for example (see here for a reference).
(But I too have no clue why somebody would want to write it that way)
For ASCII I have a wild guess: If -t
means "until the next token -1" or if placed last in line "until the end of allowed characters" the second query would state this:
To:(not a newline, more than one character)(not a newline)
So basically the expression [\x01-t\x0B\x0C\x0E-t\x80-t]
would mean [^\r\n]
.
If one applies that to (.Ç-t]|...[Ç-t])
that would address any character larger than 7bit ASCII which also could address all of unicode (besides the first 127 characters).
(That being said, I still have no clue why somebody should write it like this, but at least thats a coherent explanation besides "Its a bug")
Maybe helpful: What does the rexexes you posted mean if one writes out the \xYY? ASCII:
/=\NULL\DEVICE_CONTROL_2\NULL\.{10}\(.Ç-t]|...[Ç-t])/smiR
/^To\:[^\r\n]+[\START_OF_HEADING-t\VERTICALTAB\FORMFEED\SHIFTOUT\Ç-t]/smi
Looking after the \0x12
aka Device control 2
could help, because that won't show up in text, but maybe in net traffic.
The second regex matches lines that begin with To:
(case-insensitive) followed by at least one character that isn't a line feed or carriage return. Since this is a greedy match, I'd expect \r
or \n
to be the only possible terminating matches in the [\x01-t\x0B\x0C\x0E-t\x80-t]
character class. Note: \r
is equivalent to \x0D
and \n
is equivalent to \x0A
. Not sure what -t
means but let's pretend it was -
instead. Then the character class would be [\x01-\x0B\x0C\x0E-\x80-]
, which is still a bit convoluted but would make a little bit more sense - i.e. allowing a \n
as a terminating character but not \r
.
This is a very long shot but is there any chance this could be some kind of search-and-replace gone wrong?! (Guess this can probably be quickly discounted if there are other regexes that have normal ranges without the t
.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With