I want to convert an entire data.frame
containing more than 130 columns to numeric.
I know that I need to use as.numeric
, but the problem is that I have to apply this function separately to each one of the 130 columns. I tried to apply it to the entire data.frame
, but I got the following error message:
Error: (list) object cannot be coerced to type 'double'
How can I do that by a relatively short code?
To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.
Convert Column to int (Integer) Use pandas DataFrame. astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy. int64 , numpy.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
Convert all columns of a data frame to numeric in R To convert all the columns of the data frame to numeric in R, use the lapply() function to loop over the columns and convert to numeric by first converting it to character class as the columns were a factor.
An option with dplyr
library(dplyr)
df1 %>%
mutate_all(as.numeric)
If the columns are factor
class, convert to character
and then to numeric
df1 %>%
mutate_all(funs(as.numeric(as.character(.)))
Also, note that if there are no character
elements in any of the cells, then use type.convert
on a character
column
df1 %>%
mutate_all(funs(type.convert(as.character(.)))
If efficiency matters, one option is data.table
library(data.table)
DF1 <- copy(DF) # from other post
system.time({setDT(DF1)
for(j in seq_along(DF1)) set(DF1, i = NULL, j=j, value = as.numeric(DF1[[j]]))
})
# user system elapsed
# 0.032 0.005 0.037
In base R we can do :
df[] <- lapply(df, as.numeric)
or
df[cols_to_convert] <- lapply(df[cols_to_convert], as.numeric)
Here's a benchmark of the solutions (ignoring the considerations about factors) :
DF <- data.frame(a = 1:10000, b = letters[1:10000],
c = seq(as.Date("2004-01-01"), by = "week", len = 10000),
stringsAsFactors = TRUE)
DF <- setNames(do.call(cbind,replicate(50,DF,simplify = F)),paste0("V",1:150))
dim(DF)
# [1] 10000 150
library(dplyr)
n1tk <- function(x) data.frame(data.matrix(x))
mm <- function(x) {x[] <- lapply(x,as.numeric); x}
akrun <- function(x) mutate_all(x, as.numeric)
mo <- function(x) {for(i in 1:150){ x[, i] <- as.numeric(x[, i])}}
microbenchmark::microbenchmark(
akrun = akrun(DF),
n1tk = n1tk(DF),
mo = mo(DF),
mm = mm(DF)
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# akrun 152.9837 177.48150 198.292412 190.38610 206.56800 432.2679 100
# n1tk 10.8700 14.48015 22.632782 17.43660 21.68520 89.4694 100
# mo 9.3512 11.41880 15.313889 14.71970 17.66530 37.6390 100
# mm 4.8294 5.91975 8.906348 7.80095 10.11335 71.2647 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With