Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex get string between intervals underscores

I've seen a lot of similar questions, but I wasn't able to get the desired output.

I have a string means_variab_textimput_x2_200.txt and I want to catch ONLY what is between the third and fourth underscores: textimput

  • I'm using R, stringr, I've tried many things, but none solved the issue:
my_string <- "means_variab_textimput_x2_200.txt"

str_extract(my_string, '[_]*[^_]*[_]*[^_]*[_]*[^_]*')
"means_variab_textimput"

str_extract(my_string, '^(?:([^_]+)_){4}')
"means_variab_textimput_x2_"
str_extract(my_string, '[_]*[^_]*[_]*[^_]*[_]*[^_]*\\.') ## the closer I got was this
"_textimput_x2_200."
  • Any ideas? Ps: I'm VERY new to Regex, so details would be much appreciated :)

  • additional question: can I also get only a "part" of the word? let's say, instead of textimput only text but without counting the words? It would be good to know both possibilities

  • this this one this one were helpful, but I couldn't get the final expected results. Thanks in advance.

like image 571
Larissa Cury Avatar asked Jun 24 '26 23:06

Larissa Cury


1 Answers

stringr uses ICU based regular expressions. Therefore, an option would be to use regex lookarounds, but here the length is not fixed, thus (?<= wouldn't work. Another option is to either remove the substrings with str_remove or use str_replace to match and capture the third word which doesn't have the _ ([^_]+) and replace with the backreference (\\1) of the captured word

library(stringr)
str_replace(my_string, "^[^_]+_[^_]+_([^_]+)_.*", "\\1") 
[1] "textimput"

If we need only the substring

str_replace(my_string, "^[^_]+_[^_]+_([^_]{4}).*", "\\1") 
[1] "text"

In base R, it is easier with strsplit and get the third word with indexing

strsplit(my_string, "_")[[1]][3]
# [1] "textimput"

Or use perl = TRUE in regexpr

regmatches(my_string, regexpr("^([^_]+_){2}\\K[^_]+", my_string, perl = TRUE))
# [1] "textimput"

For the substring

regmatches(my_string, regexpr("^([^_]+_){2}\\K[^_]{4}", my_string, perl = TRUE))
[1] "text"
like image 95
akrun Avatar answered Jun 26 '26 14:06

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!