In this method to extract numbers from character string vector, the user has to call the gsub() function which is one of the inbuilt function of R language, and pass the pattern for the first occurrence of the number in the given strings and the vector of the string as the parameter of this function and in return, this ...
How about
# pattern is by finding a set of numbers in the start and capturing them
as.numeric(gsub("([0-9]+).*$", "\\1", years))
or
# pattern is to just remove _years_old
as.numeric(gsub(" years old", "", years))
or
# split by space, get the element in first index
as.numeric(sapply(strsplit(years, " "), "[[", 1))
Update
Since extract_numeric
is deprecated, we can use parse_number
from readr
package.
library(readr)
parse_number(years)
Here is another option with extract_numeric
library(tidyr)
extract_numeric(years)
#[1] 20 1
I think that substitution is an indirect way of getting to the solution. If you want to retrieve all the numbers, I recommend gregexpr
:
matches <- regmatches(years, gregexpr("[[:digit:]]+", years))
as.numeric(unlist(matches))
If you have multiple matches in a string, this will get all of them. If you're only interested in the first match, use regexpr
instead of gregexpr
and you can skip the unlist
.
Here's an alternative to Arun's first solution, with a simpler Perl-like regular expression:
as.numeric(gsub("[^\\d]+", "", years, perl=TRUE))
Or simply:
as.numeric(gsub("\\D", "", years))
# [1] 20 1
A stringr
pipelined solution:
library(stringr)
years %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric
You could get rid of all the letters too:
as.numeric(gsub("[[:alpha:]]", "", years))
Likely this is less generalizable though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With