I am using R and have the following string below:
s <- "\t\t\t \t\t\thello world ! \t\t\thello"
I want to get the match count of whitespaces at the start of the string only, not anywhere else. So the spaces between the content should be ignored and only the start should be counted. The result would be "9" here.
I have tried the following but it only returns a count of "1" ...
sapply(regmatches(s, gregexpr('^(\\s)', s)), length)
I am not very good at regex, any help is appreciated.
For matching the first occurrence, regexpr()
would be more appropriate than gregexpr()
. As a result of that switch, sapply()
will no longer be necessary because regexpr()
returns an atomic vector whereas gregexpr()
returns a list.
You could use the following regular expression, looking at the match.length attribute from the result of regexpr()
.
attr(regexpr("^\\s+", s), "match.length")
# [1] 9
Explanation of the regular expression:
^
Force the regex to be at the beginning of the string.\\s
Space characters: tab, newline, vertical tab, form feed, carriage return, and space.+
The preceding item will be matched one or more times.Reference: http://en.wikibooks.org/wiki/R_Programming/Text_Processing
Another way you can solve this is anchoring with \G
. The \G
feature is an anchor that can match at one of two positions; the beginning of the string, or the point where the last character of last match is consumed.
sapply(gregexpr("\\G\\s", s, perl = TRUE), length)
# [1] 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With