Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transforming complete age from character to numeric in R

I have a dataset with people's complete age as strings (e.g., "10 years 8 months 23 days) in R, and I need to transform it into a numeric variable that makes sense. I'm thinking about converting it to how many days of age the person has (which is hard because months have different amounts of days). So the best solution might be creating a double variable that would show age as 10.6 or 10.8, some numeric variable that carries the information that 10years 8month 5days is greater than 10years 7month 12days.

Here is an example of the current variable I have

library(tibble)

age <- tibble(complete_age = 
             c("10 years 8 months 23 days",
               "9 years 11 months 7 days",
               "11 years 3 months 1 day",
               "8 years 6 months 12 days")) 

age

# A tibble: 4 x 1
  complete_age             
  <chr>                    
1 10 years 8 months 23 days
2 9 years 11 months 7 days 
3 11 years 3 months 1 day  
4 8 years 6 months 12 days 

Here is an example of a possible outcome I would love to see (with approximated values for age_num)

> age
# A tibble: 4 x 2
  complete_age              age_num
  <chr>                       <dbl>
1 10 years 8 months 23 days    10.66
2 9 years 11 months 7 days      9.92
3 11 years 3 months 1 day      11.27
4 8 years 6 months 12 days      8.52

In summary, I have a dataset with the "complete_age" column, and I want to create the column "age_num."

How to do that in R? I'm having a hard time trying to use stringr and lubridate but maybe this is the way to go?

like image 348
Ruam Pimentel Avatar asked Dec 01 '21 20:12

Ruam Pimentel


People also ask

How do I convert character data to numerical data in R?

To convert character to numeric in R, use the as. numeric() function. The as. numeric() is a built-in R function that creates or coerces objects of type “numeric”.

How do I convert categorical to numeric in R?

There are two steps for converting factor to numeric: Step 1: Convert the data vector into a factor. The factor() command is used to create and modify factors in R. Step 2: The factor is converted into a numeric vector using as. numeric().

How do I convert a character to a date in R?

You can use the as. Date( ) function to convert character data to dates. The format is as. Date(x, "format"), where x is the character data and format gives the appropriate format.

How do I change character to numeric?

You can use the input() function in SAS to convert a character variable to a numeric variable. This function uses the following basic syntax: numeric_var = input(character_var, comma9.); The following example shows how to use this function in practice.

How to convert character to numeric in R?

How to Convert Character to Numeric in R (With Examples) We can use the following syntax to convert a character vector to a numeric vector in R: numeric_vector <- as.numeric(character_vector) This tutorial provides several examples of how to use this function in practice.

How do I calculate the age of a person in R?

x_age <- age_calc ( x_birth, # Convert birth to age date_today, units = "years") x_age # Print age with decimals # [1] 41.43562 The previous output of the RStudio console shows the result of our R code: The age of the person corresponding to our example date is 41.43562 years.

How do you convert a factor to a vector in R?

The as.numeric () is a built-in R function that creates or coerces objects of type “numeric”. If you want to convert a factor to numeric, use the as.numeric () function. Before converting a character, let’s create a character vector, and to create a character vector, use the as.character () function.

Is it possible to convert a character column to a numeric?

As you have seen, to convert a vector or variable with the character class to numeric is no problem. However, sometimes it makes sense to change all character columns of a data frame or matrix to numeric.


Video Answer


3 Answers

Using lubridate convenience functions, period and time_length:

library(lubridate)
age %>% 
  mutate(age_years = time_length(period(complete_age), unit = "years"))

  # A tibble: 4 x 2
  #  complete_age              age_years
  # <chr>                         <dbl>
  # 1 10 years 8 months 23 days 10.729637
  # 2  9 years 11 months 7 days  9.935832
  # 3   11 years 3 months 1 day 11.252738
  # 4  8 years 6 months 12 days  8.532854
like image 198
Henrik Avatar answered Nov 04 '22 20:11

Henrik


Split on space, then compute. Note, you might want to change the average days in a year, in a month as needed:

age %>% 
  separate(complete_age, into = c("Y", NA, "M", NA, "D", NA), 
           convert = TRUE, remove = FALSE) %>% 
  transmute(complete_age, age_num = Y + (M * 30.45 + D) / 365.25)

# # A tibble: 4 x 2
#   complete_age                 age_num
#   <chr>                          <dbl>
# 1 10 years 8 months 23 days      10.7 
# 2 9 years 11 months 7 days        9.94
# 3 11 years 3 months 1 day        11.3 
# 4 8 years 6 months 12 days        8.53
like image 23
zx8754 Avatar answered Nov 04 '22 19:11

zx8754


Here is an alternative approach:

  1. Remove all alphapetic '[A-Za-z]' with str_remove_all
  2. seperate the resulting numbers
  3. apply calculation after switching to numeric with type.convert(as.is = TRUE)
  4. rebind to original cols with bind_cols
library(dplyr)
library(stringr)
age %>% 
  mutate(complete_age = str_remove_all(complete_age, "[A-Za-z]")) %>% 
  separate(complete_age, c("year", "month", "day")) %>% 
  type.convert(as.is = TRUE) %>% 
  mutate(ageYear = (year + month/12 + day/365), .keep="unused") %>% 
  bind_cols(age)
  ageYear complete_age             
    <dbl> <chr>                    
1   10.7  10 years 8 months 23 days
2    9.94 9 years 11 months 7 days 
3   11.3  11 years 3 months 1 day  
4    8.53 8 years 6 months 12 days 
like image 30
TarJae Avatar answered Nov 04 '22 21:11

TarJae