Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract letters from a string in R

I have a character vector containing variable names such as x <- c("AB.38.2", "GF.40.4", "ABC.34.2"). I want to extract the letters so that I have a character vector now containing only the letters e.g. c("AB", "GF", "ABC").

Because the number of letters varies, I cannot use substring to specify the first and last characters.

How can I go about this?

like image 748
Moose Avatar asked Jun 18 '15 09:06

Moose


People also ask

How do I extract a letter from a string in R?

The substring function in R can be used either to extract parts of character strings, or to change the values of parts of character strings. substring of a vector or column in R can be extracted using substr() function. To extract the substring of the column in R we use functions like substr() and substring().

How do I find a character from a string in R?

To get access to the individual characters in an R string, you need to use the substr function: str = 'string' substr(str, 1, 1) # This evaluates to 's'. For the same reason, you can't use length to find the number of characters in a string. You have to use nchar instead.

How do I extract a character from a string?

The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.

How do I split a string into characters in R?

To split a string in R, use the strsplit() method. The strsplit() is a built-in R function that splits the string vector into sub-strings. The strsplit() method returns the list, where each list item resembles the item of input that has been split.


2 Answers

you can try

sub("^([[:alpha:]]*).*", "\\1", x)
[1] "AB"  "GF"  "ABC"
like image 133
Mamoun Benghezal Avatar answered Oct 04 '22 03:10

Mamoun Benghezal


The previous answers seem more complicated than necessary. This question regarding digits also works with letters:

> x <- c("AB.38.2", "GF.40.4", "ABC.34.2", "A B ..C 312, Fd", "  a")
> gsub("[^a-zA-Z]", "", x)
[1] "AB"    "GF"    "ABC"   "ABCFd" "a" 
like image 37
Bernard Beckerman Avatar answered Oct 04 '22 03:10

Bernard Beckerman