Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regex to parse token after @ also with no additional tokens in string

Tags:

regex

r

gsub

I have a problem in parsing address in text strings. The usual address will be "@address token token token" or "@address token token /ntoken".

string <- c("@address token token token", "@address token token /ntoken")
gsub("^\\.?@([a-z0-9_]{1,25})[^a-z0-9_]+.*$", "\\1", string)

which are correctly parsed

[1] "address" "address"

yet, in some circumstances the address will be the only token in the string, then regex will return the address including the @

string <- c("@address token token token", "@address token token /ntoken", "@address")
gsub("^\\.?@([a-z0-9_]{1,25})[^a-z0-9_]+.*$", "\\1", string)
# [1] "address"  "address"  "@address"

How to instruct regex to manage also the one-token only case?

like image 866
CptNemo Avatar asked Dec 30 '25 17:12

CptNemo


1 Answers

in some circumstances the address will be the only token in the string, then regex will return the address including the @

because in that case there is no match.

Just make a slight change:

convert [^a-z0-9_]+ into [^a-z0-9_]? to make it optional.

^\.?@([a-z0-9_]{1,25})[^a-z0-9_]?.*$

Here is Online demo

like image 181
Braj Avatar answered Jan 02 '26 05:01

Braj