Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting numbers from vectors of strings

Tags:

regex

r

People also ask

How do I extract a number from a character string in R?

In this method to extract numbers from character string vector, the user has to call the gsub() function which is one of the inbuilt function of R language, and pass the pattern for the first occurrence of the number in the given strings and the vector of the string as the parameter of this function and in return, this ...


How about

# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))

or

# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))

or

# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))

Update Since extract_numeric is deprecated, we can use parse_number from readr package.

library(readr)
parse_number(years)

Here is another option with extract_numeric

library(tidyr)
extract_numeric(years)
#[1] 20  1

I think that substitution is an indirect way of getting to the solution. If you want to retrieve all the numbers, I recommend gregexpr:

matches <- regmatches(years, gregexpr("[[:digit:]]+", years))
as.numeric(unlist(matches))

If you have multiple matches in a string, this will get all of them. If you're only interested in the first match, use regexpr instead of gregexpr and you can skip the unlist.


Here's an alternative to Arun's first solution, with a simpler Perl-like regular expression:

as.numeric(gsub("[^\\d]+", "", years, perl=TRUE))

Or simply:

as.numeric(gsub("\\D", "", years))
# [1] 20  1

A stringr pipelined solution:

library(stringr)
years %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric

You could get rid of all the letters too:

as.numeric(gsub("[[:alpha:]]", "", years))

Likely this is less generalizable though.