I seem to spend a lot of time creating a dataframe from a file, database or something, and then converting each column into the type I wanted it in (numeric, factor, character etc). Is there a way to do this in one step, possibly by giving a vector of types ?
foo<-data.frame(x=c(1:10), y=c("red", "red", "red", "blue", "blue", "blue", "yellow", "yellow", "yellow", "green"), z=Sys.Date()+c(1:10)) foo$x<-as.character(foo$x) foo$y<-as.character(foo$y) foo$z<-as.numeric(foo$z)
instead of the last three commands, I'd like to do something like
foo<-convert.magic(foo, c(character, character, numeric))
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
Way 1: Using rename() method Create a data frame with multiple columns. Create a dictionary and set key = old name, value= new name of columns header. Assign the dictionary in columns. Call the rename method and pass columns that contain dictionary and inplace=true as an argument.
You can change the column type in pandas dataframe using the df. astype() method. Once you create a dataframe, you may need to change the column type of a dataframe for reasons like converting a column to a number format which can be easily used for modeling and classification.
Convert Multiple Columns to String You can also convert multiple columns to string by sending dict of column name -> data type to astype() method. The below example converts column Fee from int to string and Discount from float to string dtype. Yields below output.
Edit See this related question for some simplifications and extensions on this basic idea.
My comment to Brandon's answer using switch
:
convert.magic <- function(obj,types){ for (i in 1:length(obj)){ FUN <- switch(types[i],character = as.character, numeric = as.numeric, factor = as.factor) obj[,i] <- FUN(obj[,i]) } obj } out <- convert.magic(foo,c('character','character','numeric')) > str(out) 'data.frame': 10 obs. of 3 variables: $ x: chr "1" "2" "3" "4" ... $ y: chr "red" "red" "red" "blue" ... $ z: num 15254 15255 15256 15257 15258 ...
For truly large data frames you may want to use lapply
instead of the for
loop:
convert.magic1 <- function(obj,types){ out <- lapply(1:length(obj),FUN = function(i){FUN1 <- switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor); FUN1(obj[,i])}) names(out) <- colnames(obj) as.data.frame(out,stringsAsFactors = FALSE) }
When doing this, be aware of some of the intricacies of coercing data in R. For example, converting from factor to numeric often involves as.numeric(as.character(...))
. Also, be aware of data.frame()
and as.data.frame()
s default behavior of converting character to factor.
If you want to automatically detect the columns data-type rather than manually specify it (e.g. after data-tidying, etc.), the function type.convert()
may help.
The function type.convert()
takes in a character vector and attempts to determine the optimal type for all elements (meaning that it has to be applied once per column).
df[] <- lapply(df, function(x) type.convert(as.character(x)))
Since I love dplyr
, I prefer:
library(dplyr) df <- df %>% mutate_all(funs(type.convert(as.character(.))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With