Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert character to numeric within data.table for specific columns?

Tags:

r

data.table

Dataset below has the characteristics of my large dataset. I am managing it in data.table, some columns are loaded as chr despite they are numbers and I want to convert them into numerics and these column names are known

dt = data.table(A=LETTERS[1:10],B=letters[1:10],C=as.character(runif(10)),D = as.character(runif(10))) # simplified version
strTmp = c('C','D') # Name of columns to be converted to numeric

# columns converted to numeric and returned a  10 x 2 data.table
dt.out1 <- dt[,lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp]

I am able to convert those 2 columns to numeric with the code above however I want to update dt instead. I tried using := however it didn't work. I need help here!

dt.out2 <- dt[, strTmp:=lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp] # returned a 10 x 6 data.table (2 columns extra)

I even tried the code below (coded like a data.frame - not my ideal solution even if it works as I am worried in some cases the order might change) but it still doesn't work. Can someone let me know why it doesn't work please?

dt[,strTmp,with=F] <- dt[,lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp]

Thanks in advance!

like image 285
Lafayette Avatar asked Apr 07 '15 15:04

Lafayette


People also ask

How do I convert a character variable to numeric?

Convert character to numeric. To convert character values to numeric values, use the INPUT function. new_variable = input(original_variable, informat.); The informat tells SASSASOUR COMPANY. SAS is the leader in analytics. Giving you The Power to Know® Through innovative software and services, SAS empowers and inspires customers around the world to transform data into intelligence. 40+ years of analytics innovation.https://www.sas.com › en_sg › company-information › profileCompany Overview - SAS how to interpret the data in the original character variable.

How do I convert specific columns to numeric in R?

To convert all the columns of the data frame to numeric in R, use the lapply() function to loop over the columns and convert to numeric by first converting it to character class as the columns were a factor.

How do I change all columns to numeric?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

How do I set multiple columns to numeric in R?

Use the lapply() Function to Convert Multiple Columns From Integer to Numeric Type in R. Base R's lapply() function allows us to apply a function to elements of a list. We will apply the as. numeric() function.

Is it possible to convert a character column to a numeric?

As you have seen, to convert a vector or variable with the character class to numeric is no problem. However, sometimes it makes sense to change all character columns of a data frame or matrix to numeric.

Can I change all columns of a data frame to numeric?

However, sometimes it makes sense to change all character columns of a data frame or matrix to numeric. Consider the following R data.frame: With the following R code, you are able to recode all variables – no matter which variable class – of a data frame to numeric:

What is the function that converts character column (is_promoted) to numeric column?

to_numeric () function converts character column (is_promoted) to numeric column as shown below view source print? “is_promoted” column is converted from character to numeric (integer). astype () function converts character column (is_promoted) to numeric column as shown below view source print?

How to convert a string to a numeric data type?

The conversion can be made by not using stringAsFactors=FALSE and then first implicitly converting the character to factor using as.factor () and then to numeric data type using as.numeric (). The information about the actual strings is completely lost even in this case.


2 Answers

  1. You don't need to assign the whole data.table if you assign by reference with := (i.e., you don't need dt.out2 <-).

  2. You need to wrap the LHS of := in parentheses to make sure it is evaluated (and not used as the name).

Like this:

dt[, (strTmp) := lapply(.SD, as.numeric), .SDcols = strTmp]
str(dt)
#Classes ‘data.table’ and 'data.frame': 10 obs. of  4 variables:
# $ A: chr  "A" "B" "C" "D" ...
# $ B: chr  "a" "b" "c" "d" ...
# $ C: num  0.30204 0.00269 0.46774 0.08641 0.02011 ...
# $ D: num  0.151 0.0216 0.5689 0.3536 0.26 ...
# - attr(*, ".internal.selfref")=<externalptr> 
like image 108
Roland Avatar answered Oct 16 '22 18:10

Roland


While Roland's answer is more idiomatic, you can also consider set within a loop for something as direct as this. An approach might be something like:

strTmp = c('C','D')
ind <- match(strTmp, names(dt))

for (i in seq_along(ind)) {
  set(dt, NULL, ind[i], as.numeric(dt[[ind[i]]]))
}

str(dt)
# Classes ‘data.table’ and 'data.frame':  10 obs. of  4 variables:
#  $ A: chr  "A" "B" "C" "D" ...
#  $ B: chr  "a" "b" "c" "d" ...
#  $ C: num  0.308 0.564 0.255 0.828 0.128 ...
#  $ D: num  0.635 0.0485 0.6281 0.4793 0.7 ...
#  - attr(*, ".internal.selfref")=<externalptr> 

From the help page at ?set, this would avoid some of the [.data.table overhead if that ever becomes a problem for you.

like image 23
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 16 '22 19:10

A5C1D2H2I1M1N2O1R2T1