Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create dataframe using formats defined in seperate dataframe template

Tags:

dataframe

r

I am creating multiple dataframes, and I want the columns in each of them to be the same type as that specified in a blank dataframe template I have created

For example I have a blank template

template <- data.frame(
  char = character(),
  int = integer(),
  fac1 = factor(levels = c('level1', 'level2', 'level3')),
  fac2 = factor(levels = c('level4', 'level5')),
  stringsAsFactors = FALSE
)

And then I want to create a few dataframes but want to keep the columns in the format of the template (i.e. char to be a character, fac2 to be a factor with two levels 'level4' and 'level5')

df1 <- data.frame(
  char = c('a', 'b'),
  int = c(1,2),
  fac1 = c('level2', 'level1'),
  fac2 = c('level4', 'level4')
)

df2 <- data.frame(
  char = c('c', 'd'),
  int = c(3,4),
  fac1 = c('level3', 'level4'),
  fac2 = c('level5', 'level4')
)

I can obviosuly specify the columns types when I am creating df1 and df2, but I want to avoid having to type out the same thing muliple times, and if for example the levels change in a factor I only want to change it in one place.

If an value is created in one of the factors which is not a level (e.g. 'level 4' in 'fac1' in 'df2' above, then it should be replaced by NA when converting to the correct format

like image 410
user1165199 Avatar asked Mar 28 '18 11:03

user1165199


2 Answers

Maybe you can just post-process your data frame:

df_template <- function(...) {
  df <- data.frame(...)
  df$char <- as.character(df$char)
  df$int  <- as.integer(df$int)
  df$fac1 <- factor(df$fac1, levels = c('level1', 'level2', 'level3'))
  df$fac2 <- factor(df$fac2, levels = c('level4', 'level5'))
  df
}
like image 195
F. Privé Avatar answered Nov 14 '22 23:11

F. Privé


We can create a function that checks the type of each column of the template and use a as.* function to coerce the corresponding column of the relevant data.frame to the to the relevant type.

We make an exception for factors (as their type is integer) and we assign the relevant levels to the new modified column.

Map takes the column of template and input by pair, and the output (a list) is then converted to data.frame.

format_df <- function(df,template) {
  as.data.frame(
    Map(function(x,y) {
      if(is.factor(x))
        factor(y,levels(x))
      else
        match.fun(paste0("as.",typeof(x)))(y)
        # or `class<-`(y,class(x)) , same effect for given example
    },template,df),
    stringsAsFactors = FALSE)
}


df1b <- format_df(df1,template)
# char int   fac1   fac2
# 1    a   1 level2 level4
# 2    b   2 level1 level4

str(df1b)
# 'data.frame': 2 obs. of  4 variables:
# $ char: chr  "a" "b"
# $ int : int  1 2
# $ fac1: Factor w/ 3 levels "level1","level2",..: 2 1
# $ fac2: Factor w/ 2 levels "level4","level5": 1 1

Note the level5 in output.

like image 38
Moody_Mudskipper Avatar answered Nov 14 '22 21:11

Moody_Mudskipper