I'm trying to get the location of a white space inside a string but I don't understand the results.
Given the string:
a = "12345,1300 miles"
> gregexpr("\\s", a)
[[1]]
[1] 11
attr(,"match.length")
[1] 1
This makes sense b/c the white space is in index 11 of the string.
> gregexpr("[\\s]", a)
[[1]]
[1] 16
attr(,"match.length")
[1] 1
This does not make sense to me b/c index 16 is simply the end of the string. There is no white space there, and I'm wondering why it skipped index 11.
I'm stumped, can anyone give an explanation on why this is happening?
> gregexpr("\\s*", a)
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
attr(,"match.length")
[1] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
This also does not make sense to me b/c the white space matched every single character in the string.
Inside character classes you should probably not be using escaped regex sequences. They are not recognized properly. I do not know if this is proper regex behavior, but there is a sentence in the ?regex page saying: "Most metacharacters lose their special meaning inside a character class. " I can successfully use [:space:] instead
> grep("[\\s]", "ttt rrr a vvv")
integer(0)
> grep("[[:space:]]", "ttt rrr a vvv")
[1] 1
In the second instance it is true that all of those substrings will match that pattern. The behavior of this code is perhaps what you expected:
gregexpr("\\s.*", a)
[[1]]
[1] 11
attr(,"match.length")
[1] 6
attr(,"useBytes")
[1] TRUE
Or:
gregexpr("\\s+", a)
[[1]]
[1] 11
attr(,"match.length")
[1] 1
attr(,"useBytes")
[1] TRUE
I can explain you the behaviour for the \s* case. The quantifier * matches 0 or more occurrences. This 0 means it matches if it does not find a whitespace:
12345,1300 miles
Your regex \s* see the first character "1" ==> there is no \s, so it matches 0 occurrences, means it MATCHES with length 0
Then it goes on to the second character "2" ==> there is no \s, so it matches 0 occurrences, means it MATCHES with length 0
On the third character ....
This regex does not match "every single character in the string" it matches the empty string between those characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With