Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R : get string between braces { }

Tags:

string

regex

r

UPDATED I need to get the characters between braces { }.

For example,

a <- "{a,b}->{v}"

Output : a,b and v

like image 451
anz Avatar asked May 02 '26 14:05

anz


2 Answers

You can use stingr's str_extract_all

In the following expression (?<=\\{) is used to find opening curly braces, (?=\\}) is used to detect closed braces and .+? is used to extract text in between. Hence, the final expression would become (?<=\\{).+?(?=\\})

This will return a list()

str_extract_all(a, "(?<=\\{).+?(?=\\})")[[1]]

Please follow another example performed by me:

> a <- "{a,b}->{v}{d}{c}{67}"
> str_extract_all(a, "(?<=\\{).+?(?=\\})")[[1]]
[1] "a,b" "v"   "d"   "c"   "67" 
like image 144
Rishabh Ojha Avatar answered May 04 '26 02:05

Rishabh Ojha


If you need to match strings in between curly braces excluding the curly braces, you may use

a <- "{a,b}->{v}"
stringr::str_extract_all(a, "(?<=\\{)[^{}]+(?=\\})")           # With stringr library
# => [1] "a,b" "v"
regmatches(a, gregexpr("(?<=\\{)[^{}]+(?=\\})", a, perl=TRUE)) # Base R approach #1
# => [1] "a,b" "v"
regmatches(a, gregexpr("\\{\\K[^{}]+(?=\\})", a, perl=TRUE))   # Base R approach #2
# => [1] "a,b" "v"

See the regex #1 demo. Details:

  • (?<=\{) - a positive lookbehind that requires a { immediately to the left of the current location
  • [^{}]+ - 1 or more (due to the + quantifier) chars other than { and } (the [^...] is a negated bracket expression in the TRE regex that is used by default in base R regex functions (or a negated character class in NFA regex, as is used in the ICU regexps in stringr package)
  • (?=\}) - a positive lookahead that requires a } immediately to the left of the current location
  • \{\K means that after matching and consuming {, the text matched is discarded from the match value, so the { does not land in the results. See Keep The Text Matched So Far out of The Overall Regex Match for more details.

To match strings inside non-nested curly braces including the curly braces, you may use

a <- "{a,b}->{v}"
stringr::str_extract_all(a, "\\{[^{}]*\\}")  # With stringr library
regmatches(a, gregexpr("\\{[^{}]*}", a))     # Base R approach
# => [1] "{a,b}" "{v}" 

See the regex

Here, \{[^{}]*\} matches all substrings starting with {, then 0+ chars other than { and } (with [^{}]*) and then ending with }.

See the R demo online.

like image 45
Wiktor Stribiżew Avatar answered May 04 '26 03:05

Wiktor Stribiżew