I have a problem in parsing address in text strings. The usual address will be "@address token token token" or "@address token token /ntoken".
string <- c("@address token token token", "@address token token /ntoken")
gsub("^\\.?@([a-z0-9_]{1,25})[^a-z0-9_]+.*$", "\\1", string)
which are correctly parsed
[1] "address" "address"
yet, in some circumstances the address will be the only token in the string, then regex will return the address including the @
string <- c("@address token token token", "@address token token /ntoken", "@address")
gsub("^\\.?@([a-z0-9_]{1,25})[^a-z0-9_]+.*$", "\\1", string)
# [1] "address" "address" "@address"
How to instruct regex to manage also the one-token only case?
in some circumstances the address will be the only token in the string, then regex will return the address including the @
because in that case there is no match.
Just make a slight change:
convert [^a-z0-9_]+ into [^a-z0-9_]? to make it optional.
^\.?@([a-z0-9_]{1,25})[^a-z0-9_]?.*$
Here is Online demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With