I have the following vector c:
ABC-XXX
DEF-4-YYY
I want to extract everything before the last occurence of '-', meaning that I would keep this
ABC
DEF-4
I've tried the following:
sub([-].*, '', "DEF-4-YYY")
But this replaces everything after the first '-', while it should look for the last '-'. Output of the above command is:
"DEF"
What is wrong here?
The substring function in R can be used either to extract parts of character strings, or to change the values of parts of character strings. substring of a vector or column in R can be extracted using substr() function. To extract the substring of the column in R we use functions like substr() and substring().
To extract words from a string vector, we can use word function of stringr package. For example, if we have a vector called x that contains 100 words then first 20 words can be extracted by using the command word(x,start=1,end=20,sep=fixed(" ")).
To remove the string's last character, we can use the built-in substring() function in R. The substring() function accepts 3 arguments, the first one is a string, the second is start position, third is end position.
We can do with sub
by matching a -
followed by zero or more characters that are not a -
till the end ($
) of the string and replace it with blank (''
)
sub('-[^-]*$', '', v1)
#[1] "ABC" "DEF-4"
v1 <- c('ABC-XXX', 'DEF-4-YYY')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With