Guess correct column storage mode from data.frame of strings



Given a data.frame containing columns of only strings (no factors), some of which should remain strings, some of which are integers, and some of which are doubles, how can I guess the most appropriate storage mode to which to convert the strings?

fixDf <- data.frame(isChar=c("A", "B", "C"), 
  isDouble=c("0.01", "0.02", "0.03"), 
  isInteger=c("1", "2", "3"), stringsAsFactors=FALSE)

I am wondering if there is an easy way to determine that the following needs to be done, and then to do it:

mode(fixDf[, "isDouble"]) <- "double"
mode(fixDf[, "isInteger"]) <- "integer"

Ideally, where errors are encountered a function to handle this would leave the data in its string form.

2 Answers

you can use colwise from the plyr package and the type.convert function.

foo = colwise(type.convert)(fixDf)


'data.frame':   3 obs. of  3 variables:
 $ isChar   : Factor w/ 3 levels "A","B","C": 1 2 3
 $ isDouble : num  0.01 0.02 0.03
 $ isInteger: int  1 2 3

Or using base R:

as.data.frame(lapply(fixDf, type.convert))
type_convert from readr does exactly what you want, operating on an entire data frame. It handles logical, numeric (integer and double), strings, and dates/times well, without coercing to factor.


To parse columns individually, use parse_guess.

