Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove text after the second space

Tags:

string

regex

r

I have a matrix like this (each row is a string):

m <- matrix(c("Agarista revoluta (Spreng.) Hook. f. ex Nied.", 
              "Amaioua intermedia Mart.", 
              "Baccharis reticularia DC."),, 1)

I would like to remove the text after the second space and to return:

Agarista revoluta
Amaioua intermedia
Baccharis reticularia

I tried some combinations with gsub but I did not succeed.

Can anyone help me with this?

like image 829
Karlo Guidoni Martins Avatar asked Dec 21 '16 13:12

Karlo Guidoni Martins


People also ask

How do you get rid of text after space?

Select a blank cell, enter the formula =RemoveAfterLastSpace(A2) (A2 is the cell where you will remove all characters after the last space) into it, and the drag the Fill Handle to the range as you need.

How do I extract text before and after a specific character in Excel?

To get text following a specific character, you use a slightly different approach: get the position of the character with either SEARCH or FIND, subtract that number from the total string length returned by the LEN function, and extract that many characters from the end of the string.

How do I extract text between two spaces in Excel?

Select a cell which you will place the result, type this formula =MID(LEFT(A1,FIND(">",A1)-1),FIND("<",A1)+1,LEN(A1)), and press Enter key. Note: A1 is the text cell, > and < are the two characters you want to extract string between.


1 Answers

You may use

x <- c("Agarista revoluta (Spreng.) Hook. f. ex Nied.", "Amaioua intermedia Mart.", "Baccharis reticularia DC.")
sub("^(\\S*\\s+\\S+).*", "\\1", x)
## => [1] "Agarista revoluta"     "Amaioua intermedia"    "Baccharis reticularia"

See the regex demo and an online R demo.

Pattern details:

  • ^ - start of string
  • (\\S*\\s+\\S+) - Group 1 capturing 0+ non-whitespace chars, then 1+ whitespaces, and then 1+ non-whitespaces
  • .* - any 0+ chars, as many as possible (up to the end of string).

Note that in case your strings might have leading whitespace, and you do not want to count that whitespace in, you should use

sub("^\\s*(\\S+\\s+\\S+).*", "\\1", x)

See another R demo

like image 61
Wiktor Stribiżew Avatar answered Sep 30 '22 03:09

Wiktor Stribiżew