Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting unique numbers from string in R

Tags:

regex

r

I have a list of strings which contain random characters such as:

list=list() list[1] = "djud7+dg[a]hs667" list[2] = "7fd*hac11(5)" list[3] = "2tu,g7gka5" 

I'd like to know which numbers are present at least once (unique()) in this list. The solution of my example is:

solution: c(7,667,11,5,2)

If someone has a method that does not consider 11 as "eleven" but as "one and one", it would also be useful. The solution in this condition would be:

solution: c(7,6,1,5,2)

(I found this post on a related subject: Extracting numbers from vectors of strings)

like image 415
Remi.b Avatar asked Jun 09 '13 12:06

Remi.b


People also ask

How do I extract unique values in R?

Use the unique() function to retrieve unique elements from a Vector, data frame, or array-like R object. The unique() function in R returns a vector, data frame, or array-like object with duplicate elements and rows deleted.


1 Answers

For the second answer, you can use gsub to remove everything from the string that's not a number, then split the string as follows:

unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), "")))) # [1] 7 6 1 5 2 

For the first answer, similarly using strsplit,

unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+"))))) # [1]   7 667  11   5   2 

PS: don't name your variable list (as there's an inbuilt function list). I've named your data as ll.

like image 67
Arun Avatar answered Sep 21 '22 22:09

Arun