I am creating multiple dataframes, and I want the columns in each of them to be the same type as that specified in a blank dataframe template I have created
For example I have a blank template
template <- data.frame(
char = character(),
int = integer(),
fac1 = factor(levels = c('level1', 'level2', 'level3')),
fac2 = factor(levels = c('level4', 'level5')),
stringsAsFactors = FALSE
)
And then I want to create a few dataframes but want to keep the columns in the format of the template (i.e. char to be a character, fac2 to be a factor with two levels 'level4' and 'level5')
df1 <- data.frame(
char = c('a', 'b'),
int = c(1,2),
fac1 = c('level2', 'level1'),
fac2 = c('level4', 'level4')
)
df2 <- data.frame(
char = c('c', 'd'),
int = c(3,4),
fac1 = c('level3', 'level4'),
fac2 = c('level5', 'level4')
)
I can obviosuly specify the columns types when I am creating df1
and df2
, but I want to avoid having to type out the same thing muliple times, and if for example the levels change in a factor I only want to change it in one place.
If an value is created in one of the factors which is not a level (e.g. 'level 4' in 'fac1' in 'df2' above, then it should be replaced by NA when converting to the correct format
Maybe you can just post-process your data frame:
df_template <- function(...) {
df <- data.frame(...)
df$char <- as.character(df$char)
df$int <- as.integer(df$int)
df$fac1 <- factor(df$fac1, levels = c('level1', 'level2', 'level3'))
df$fac2 <- factor(df$fac2, levels = c('level4', 'level5'))
df
}
We can create a function that checks the type
of each column of the template and use a as.*
function to coerce the corresponding column of the relevant data.frame
to the to the relevant type
.
We make an exception for factors
(as their type
is integer
) and we assign the relevant levels
to the new modified column.
Map
takes the column of template and input by pair, and the output (a list
) is then converted to data.frame
.
format_df <- function(df,template) {
as.data.frame(
Map(function(x,y) {
if(is.factor(x))
factor(y,levels(x))
else
match.fun(paste0("as.",typeof(x)))(y)
# or `class<-`(y,class(x)) , same effect for given example
},template,df),
stringsAsFactors = FALSE)
}
df1b <- format_df(df1,template)
# char int fac1 fac2
# 1 a 1 level2 level4
# 2 b 2 level1 level4
str(df1b)
# 'data.frame': 2 obs. of 4 variables:
# $ char: chr "a" "b"
# $ int : int 1 2
# $ fac1: Factor w/ 3 levels "level1","level2",..: 2 1
# $ fac2: Factor w/ 2 levels "level4","level5": 1 1
Note the level5
in output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With