I'd like to extract elements beginning with digits from a character vector but there's something about POSIX regular expression syntax that I don't understand.
I would think that
vec <- c("012 foo", "305 bar", "other", "notIt 7") grep(pattern="[:digit:]", x=vec)
would return 1 2 4
since they are the four elements that have digits somewhere in them. But in fact it returns 3 4
.
Likewise grep(pattern="^0", x=vec)
returns 1
as I would expect because element 1 starts with a zero. However grep(pattern="^[:digit:]", x=vec)
returns integer(0)
whereas I would expect it to return 1 2
since those are the elements that start with digits.
How am I misunderstanding the syntax?
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
\d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ). \s (space) matches any single whitespace (same as [ \t\n\r\f] , blank, tab, newline, carriage-return and form-feed).
The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern.
Regular expressions (shortened as "regex") are special strings representing a pattern to be matched in a search operation. They are an important tool in a wide variety of computing applications, from programming languages like Java and Perl, to text processing tools like grep, sed, and the text editor vim.
Try
grep(pattern="[[:digit:]]", x=vec)
instead as the 'meta-patterns' between colons usually require double brackets.
Another solution
grep(pattern="\\d", x=vec)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With