R beginning match count

Question

I am using R and have the following string below:

s <- "			   			hello    world   !  			hello"

I want to get the match count of whitespaces at the start of the string only, not anywhere else. So the spaces between the content should be ignored and only the start should be counted. The result would be "9" here.

I have tried the following but it only returns a count of "1" ...

sapply(regmatches(s, gregexpr('^(\s)', s)), length)

I am not very good at regex, any help is appreciated.

Rich Scriven · Accepted Answer

For matching the first occurrence, regexpr() would be more appropriate than gregexpr(). As a result of that switch, sapply() will no longer be necessary because regexpr() returns an atomic vector whereas gregexpr() returns a list.

You could use the following regular expression, looking at the match.length attribute from the result of regexpr().

attr(regexpr("^\s+", s), "match.length")
# [1] 9

Explanation of the regular expression:

^ Force the regex to be at the beginning of the string.
\s Space characters: tab, newline, vertical tab, form feed, carriage return, and space.
+ The preceding item will be matched one or more times.

Reference: http://en.wikibooks.org/wiki/R_Programming/Text_Processing

hwnd · Answer

Another way you can solve this is anchoring with \G. The \G feature is an anchor that can match at one of two positions; the beginning of the string, or the point where the last character of last match is consumed.

sapply(gregexpr("\G\s", s, perl = TRUE), length)
# [1] 9

R beginning match count

Tags:

regex

r

chaz

2 Answers

Rich Scriven

hwnd

Recent Activity

Donate For Us

R beginning match count

Tags:

regex

r

chaz

2 Answers

Rich Scriven

hwnd

Related questions

Recent Activity

Donate For Us