Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Guess correct column storage mode from data.frame of strings

Tags:

r

Given a data.frame containing columns of only strings (no factors), some of which should remain strings, some of which are integers, and some of which are doubles, how can I guess the most appropriate storage mode to which to convert the strings?

fixDf <- data.frame(isChar=c("A", "B", "C"), 
  isDouble=c("0.01", "0.02", "0.03"), 
  isInteger=c("1", "2", "3"), stringsAsFactors=FALSE)

I am wondering if there is an easy way to determine that the following needs to be done, and then to do it:

mode(fixDf[, "isDouble"]) <- "double"
mode(fixDf[, "isInteger"]) <- "integer"

Ideally, where errors are encountered a function to handle this would leave the data in its string form.

like image 541
digitalmaps Avatar asked Jan 14 '13 19:01

digitalmaps


People also ask

How do I find the data type of a column in R?

There are several ways to check data type in R. We can make use of the “typeof()” function, “class()” function and even the “str()” function to check the data type of an entire dataframe.

How many types of data types are present in R 4 5 6?

Everything in R is an object. R has 6 basic data types.

What does each column in the data frame represent?

The columns represent variables, or the attributes of each case that were measured. When organizing data in a data frame, what does the row represent? Column? Shows you just the first few rows of a data frame.


2 Answers

you can use colwise from the plyr package and the type.convert function.

library(plyr)
foo = colwise(type.convert)(fixDf)

str(foo)


'data.frame':   3 obs. of  3 variables:
 $ isChar   : Factor w/ 3 levels "A","B","C": 1 2 3
 $ isDouble : num  0.01 0.02 0.03
 $ isInteger: int  1 2 3

Or using base R:

as.data.frame(lapply(fixDf, type.convert))
like image 145
Justin Avatar answered Oct 08 '22 22:10

Justin


type_convert from readr does exactly what you want, operating on an entire data frame. It handles logical, numeric (integer and double), strings, and dates/times well, without coercing to factor.

type_convert(fixDf)

To parse columns individually, use parse_guess.

like image 32
qwr Avatar answered Oct 08 '22 21:10

qwr