In this method to extract numbers from character string vector, the user has to call the gsub() function which is one of the inbuilt function of R language, and pass the pattern for the first occurrence of the number in the given strings and the vector of the string as the parameter of this function and in return, this ...
How about
# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))
or
# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))
or
# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))
Update
Since extract_numeric is deprecated, we can use parse_number from readr package.
library(readr)
parse_number(years)
Here is another option with extract_numeric
library(tidyr)
extract_numeric(years)
#[1] 20 1
I think that substitution is an indirect way of getting to the solution. If you want to retrieve all the numbers, I recommend gregexpr:
matches <- regmatches(years, gregexpr("[[:digit:]]+", years))
as.numeric(unlist(matches))
If you have multiple matches in a string, this will get all of them. If you're only interested in the first match, use regexpr instead of gregexpr and you can skip the unlist.
Here's an alternative to Arun's first solution, with a simpler Perl-like regular expression:
as.numeric(gsub("[^\\d]+", "", years, perl=TRUE))
Or simply:
as.numeric(gsub("\\D", "", years))
# [1] 20 1
A stringr pipelined solution:
library(stringr)
years %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric
You could get rid of all the letters too:
as.numeric(gsub("[[:alpha:]]", "", years))
Likely this is less generalizable though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With