Can someone explain what MATLAB is doing with nul bytes (x00
) in regular expressions?
Examples:
>> regexp(char([0 0 0 0 0 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
1 % current
4 % expected
>> regexp(char([0 0 0 1 0 0 0 1 0 0 10 0 0 0]),char([1 0 0 0 46 0 0 10]))
ans =
4 % current
4 % expected
>> regexp(char([0 0 0 1 0 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
[] % current
[] % expected
>> regexp(char([0 0 0 0 10 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
1 % current
[] % expected
>> regexp(char([0 0 0 0 0 0 0 1 0 0 10 0 0 0]),char([1 0 0 0 46 0 0 10]))
ans =
[] % current
[] % expected
The answer might simply be, MATLAB regular expression isn't meant to handle non printable characters, but I would assume it would error if this was the case.
EDIT: The 46 is expected to be '.'
as in the regex wildcard.
EDIT2:
>> regexp(char([0 0 0 0 50 0 0 100 0 0 90 0 0 0]),char([0 0 46 0 0 90]))
ans =
1 9
I realized it could have been 10 being a special character so this one has only printable and nul bytes. I would expect this one to only match 9 because the fifth character 50
does not match 0
.
Description. startIndex = regexp( str , expression ) returns the starting index of each substring of str that matches the character patterns specified by the regular expression.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
A regex is a specification of a pattern to be matched in the searched text. This pattern consists of a sequence of tokens, each being able to match a single character or a sequence of characters in the text, or assert that a specific position within the text has been reached (the latter is called an anchor.)
this bug is probably already fixed. I tested your example from Matlab Central in several versions:
in R2013b:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
2
in R2015a:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
2
in R2016a:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
[]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With