Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert an entire data.frame to numeric

Tags:

dataframe

r

I want to convert an entire data.frame containing more than 130 columns to numeric.

I know that I need to use as.numeric, but the problem is that I have to apply this function separately to each one of the 130 columns. I tried to apply it to the entire data.frame, but I got the following error message:

Error: (list) object cannot be coerced to type 'double'

How can I do that by a relatively short code?

like image 980
MoMo Avatar asked Oct 20 '18 20:10

MoMo


People also ask

How do you convert a whole DataFrame to a numeric?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

How do I turn a data frame into a number?

Convert Column to int (Integer) Use pandas DataFrame. astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy. int64 , numpy.

How do you convert a whole DataFrame to a number in Python?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

How do I convert a data frame to numeric in R?

Convert all columns of a data frame to numeric in R To convert all the columns of the data frame to numeric in R, use the lapply() function to loop over the columns and convert to numeric by first converting it to character class as the columns were a factor.


2 Answers

An option with dplyr

library(dplyr)
df1 %>%
   mutate_all(as.numeric)

If the columns are factor class, convert to character and then to numeric

df1 %>%
    mutate_all(funs(as.numeric(as.character(.)))

Also, note that if there are no character elements in any of the cells, then use type.convert on a character column

df1 %>%
    mutate_all(funs(type.convert(as.character(.)))

If efficiency matters, one option is data.table

library(data.table)
DF1 <- copy(DF) # from other post
system.time({setDT(DF1)
    for(j in seq_along(DF1)) set(DF1, i = NULL, j=j, value = as.numeric(DF1[[j]]))
  })
#   user  system elapsed 
#  0.032   0.005   0.037 
like image 25
akrun Avatar answered Sep 28 '22 13:09

akrun


In base R we can do :

df[] <- lapply(df, as.numeric)

or

df[cols_to_convert]  <- lapply(df[cols_to_convert], as.numeric)

Here's a benchmark of the solutions (ignoring the considerations about factors) :

DF <- data.frame(a = 1:10000, b = letters[1:10000],
                 c = seq(as.Date("2004-01-01"), by = "week", len = 10000),
                 stringsAsFactors = TRUE)
DF <- setNames(do.call(cbind,replicate(50,DF,simplify = F)),paste0("V",1:150))

dim(DF)
# [1] 10000   150

library(dplyr)
n1tk  <- function(x) data.frame(data.matrix(x))
mm    <- function(x) {x[] <- lapply(x,as.numeric); x}
akrun <- function(x) mutate_all(x, as.numeric)
mo    <- function(x)  {for(i in 1:150){ x[, i] <- as.numeric(x[, i])}}

microbenchmark::microbenchmark(
  akrun = akrun(DF),
  n1tk  = n1tk(DF),
  mo    = mo(DF),
  mm    = mm(DF)
)

# Unit: milliseconds
#   expr      min        lq       mean    median        uq      max neval
#  akrun 152.9837 177.48150 198.292412 190.38610 206.56800 432.2679   100
#   n1tk  10.8700  14.48015  22.632782  17.43660  21.68520  89.4694   100
#     mo   9.3512  11.41880  15.313889  14.71970  17.66530  37.6390   100
#     mm   4.8294   5.91975   8.906348   7.80095  10.11335  71.2647   100
like image 65
Moody_Mudskipper Avatar answered Sep 28 '22 12:09

Moody_Mudskipper