I have a matrix like this (each row is a string):
m <- matrix(c("Agarista revoluta (Spreng.) Hook. f. ex Nied.",
"Amaioua intermedia Mart.",
"Baccharis reticularia DC."),, 1)
I would like to remove the text after the second space and to return:
Agarista revoluta
Amaioua intermedia
Baccharis reticularia
I tried some combinations with gsub
but I did not succeed.
Can anyone help me with this?
Select a blank cell, enter the formula =RemoveAfterLastSpace(A2) (A2 is the cell where you will remove all characters after the last space) into it, and the drag the Fill Handle to the range as you need.
To get text following a specific character, you use a slightly different approach: get the position of the character with either SEARCH or FIND, subtract that number from the total string length returned by the LEN function, and extract that many characters from the end of the string.
Select a cell which you will place the result, type this formula =MID(LEFT(A1,FIND(">",A1)-1),FIND("<",A1)+1,LEN(A1)), and press Enter key. Note: A1 is the text cell, > and < are the two characters you want to extract string between.
You may use
x <- c("Agarista revoluta (Spreng.) Hook. f. ex Nied.", "Amaioua intermedia Mart.", "Baccharis reticularia DC.")
sub("^(\\S*\\s+\\S+).*", "\\1", x)
## => [1] "Agarista revoluta" "Amaioua intermedia" "Baccharis reticularia"
See the regex demo and an online R demo.
Pattern details:
^
- start of string(\\S*\\s+\\S+)
- Group 1 capturing 0+ non-whitespace chars, then 1+ whitespaces, and then 1+ non-whitespaces.*
- any 0+ chars, as many as possible (up to the end of string).Note that in case your strings might have leading whitespace, and you do not want to count that whitespace in, you should use
sub("^\\s*(\\S+\\s+\\S+).*", "\\1", x)
See another R demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With