I have a dataset with people's complete age as strings (e.g., "10 years 8 months 23 days) in R, and I need to transform it into a numeric variable that makes sense. I'm thinking about converting it to how many days of age the person has (which is hard because months have different amounts of days). So the best solution might be creating a double variable that would show age as 10.6 or 10.8, some numeric variable that carries the information that 10years 8month 5days is greater than 10years 7month 12days.
Here is an example of the current variable I have
library(tibble)
age <- tibble(complete_age =
c("10 years 8 months 23 days",
"9 years 11 months 7 days",
"11 years 3 months 1 day",
"8 years 6 months 12 days"))
age
# A tibble: 4 x 1
complete_age
<chr>
1 10 years 8 months 23 days
2 9 years 11 months 7 days
3 11 years 3 months 1 day
4 8 years 6 months 12 days
Here is an example of a possible outcome I would love to see (with approximated values for age_num)
> age
# A tibble: 4 x 2
complete_age age_num
<chr> <dbl>
1 10 years 8 months 23 days 10.66
2 9 years 11 months 7 days 9.92
3 11 years 3 months 1 day 11.27
4 8 years 6 months 12 days 8.52
In summary, I have a dataset with the "complete_age" column, and I want to create the column "age_num."
How to do that in R? I'm having a hard time trying to use stringr
and lubridate
but maybe this is the way to go?
To convert character to numeric in R, use the as. numeric() function. The as. numeric() is a built-in R function that creates or coerces objects of type “numeric”.
There are two steps for converting factor to numeric: Step 1: Convert the data vector into a factor. The factor() command is used to create and modify factors in R. Step 2: The factor is converted into a numeric vector using as. numeric().
You can use the as. Date( ) function to convert character data to dates. The format is as. Date(x, "format"), where x is the character data and format gives the appropriate format.
You can use the input() function in SAS to convert a character variable to a numeric variable. This function uses the following basic syntax: numeric_var = input(character_var, comma9.); The following example shows how to use this function in practice.
How to Convert Character to Numeric in R (With Examples) We can use the following syntax to convert a character vector to a numeric vector in R: numeric_vector <- as.numeric(character_vector) This tutorial provides several examples of how to use this function in practice.
x_age <- age_calc ( x_birth, # Convert birth to age date_today, units = "years") x_age # Print age with decimals # [1] 41.43562 The previous output of the RStudio console shows the result of our R code: The age of the person corresponding to our example date is 41.43562 years.
The as.numeric () is a built-in R function that creates or coerces objects of type “numeric”. If you want to convert a factor to numeric, use the as.numeric () function. Before converting a character, let’s create a character vector, and to create a character vector, use the as.character () function.
As you have seen, to convert a vector or variable with the character class to numeric is no problem. However, sometimes it makes sense to change all character columns of a data frame or matrix to numeric.
Using lubridate
convenience functions, period
and time_length
:
library(lubridate)
age %>%
mutate(age_years = time_length(period(complete_age), unit = "years"))
# A tibble: 4 x 2
# complete_age age_years
# <chr> <dbl>
# 1 10 years 8 months 23 days 10.729637
# 2 9 years 11 months 7 days 9.935832
# 3 11 years 3 months 1 day 11.252738
# 4 8 years 6 months 12 days 8.532854
Split on space, then compute. Note, you might want to change the average days in a year, in a month as needed:
age %>%
separate(complete_age, into = c("Y", NA, "M", NA, "D", NA),
convert = TRUE, remove = FALSE) %>%
transmute(complete_age, age_num = Y + (M * 30.45 + D) / 365.25)
# # A tibble: 4 x 2
# complete_age age_num
# <chr> <dbl>
# 1 10 years 8 months 23 days 10.7
# 2 9 years 11 months 7 days 9.94
# 3 11 years 3 months 1 day 11.3
# 4 8 years 6 months 12 days 8.53
Here is an alternative approach:
'[A-Za-z]'
with str_remove_all
type.convert(as.is = TRUE)
bind_cols
library(dplyr)
library(stringr)
age %>%
mutate(complete_age = str_remove_all(complete_age, "[A-Za-z]")) %>%
separate(complete_age, c("year", "month", "day")) %>%
type.convert(as.is = TRUE) %>%
mutate(ageYear = (year + month/12 + day/365), .keep="unused") %>%
bind_cols(age)
ageYear complete_age
<dbl> <chr>
1 10.7 10 years 8 months 23 days
2 9.94 9 years 11 months 7 days
3 11.3 11 years 3 months 1 day
4 8.53 8 years 6 months 12 days
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With