Dataset below has the characteristics of my large dataset. I am managing it in data.table, some columns are loaded as chr despite they are numbers and I want to convert them into numerics and these column names are known
dt = data.table(A=LETTERS[1:10],B=letters[1:10],C=as.character(runif(10)),D = as.character(runif(10))) # simplified version
strTmp = c('C','D') # Name of columns to be converted to numeric
# columns converted to numeric and returned a 10 x 2 data.table
dt.out1 <- dt[,lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp]
I am able to convert those 2 columns to numeric with the code above however I want to update dt instead. I tried using := however it didn't work. I need help here!
dt.out2 <- dt[, strTmp:=lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp] # returned a 10 x 6 data.table (2 columns extra)
I even tried the code below (coded like a data.frame - not my ideal solution even if it works as I am worried in some cases the order might change) but it still doesn't work. Can someone let me know why it doesn't work please?
dt[,strTmp,with=F] <- dt[,lapply(.SD, as.numeric, na.rm = T), .SDcols = strTmp]
Thanks in advance!
Convert character to numeric. To convert character values to numeric values, use the INPUT function. new_variable = input(original_variable, informat.); The informat tells SASSASOUR COMPANY. SAS is the leader in analytics. Giving you The Power to Know® Through innovative software and services, SAS empowers and inspires customers around the world to transform data into intelligence. 40+ years of analytics innovation.https://www.sas.com › en_sg › company-information › profileCompany Overview - SAS how to interpret the data in the original character variable.
To convert all the columns of the data frame to numeric in R, use the lapply() function to loop over the columns and convert to numeric by first converting it to character class as the columns were a factor.
To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.
Use the lapply() Function to Convert Multiple Columns From Integer to Numeric Type in R. Base R's lapply() function allows us to apply a function to elements of a list. We will apply the as. numeric() function.
As you have seen, to convert a vector or variable with the character class to numeric is no problem. However, sometimes it makes sense to change all character columns of a data frame or matrix to numeric.
However, sometimes it makes sense to change all character columns of a data frame or matrix to numeric. Consider the following R data.frame: With the following R code, you are able to recode all variables – no matter which variable class – of a data frame to numeric:
to_numeric () function converts character column (is_promoted) to numeric column as shown below view source print? “is_promoted” column is converted from character to numeric (integer). astype () function converts character column (is_promoted) to numeric column as shown below view source print?
The conversion can be made by not using stringAsFactors=FALSE and then first implicitly converting the character to factor using as.factor () and then to numeric data type using as.numeric (). The information about the actual strings is completely lost even in this case.
You don't need to assign the whole data.table if you assign by reference with :=
(i.e., you don't need dt.out2 <-
).
You need to wrap the LHS of :=
in parentheses to make sure it is evaluated (and not used as the name).
Like this:
dt[, (strTmp) := lapply(.SD, as.numeric), .SDcols = strTmp]
str(dt)
#Classes ‘data.table’ and 'data.frame': 10 obs. of 4 variables:
# $ A: chr "A" "B" "C" "D" ...
# $ B: chr "a" "b" "c" "d" ...
# $ C: num 0.30204 0.00269 0.46774 0.08641 0.02011 ...
# $ D: num 0.151 0.0216 0.5689 0.3536 0.26 ...
# - attr(*, ".internal.selfref")=<externalptr>
While Roland's answer is more idiomatic, you can also consider set
within a loop for something as direct as this. An approach might be something like:
strTmp = c('C','D')
ind <- match(strTmp, names(dt))
for (i in seq_along(ind)) {
set(dt, NULL, ind[i], as.numeric(dt[[ind[i]]]))
}
str(dt)
# Classes ‘data.table’ and 'data.frame': 10 obs. of 4 variables:
# $ A: chr "A" "B" "C" "D" ...
# $ B: chr "a" "b" "c" "d" ...
# $ C: num 0.308 0.564 0.255 0.828 0.128 ...
# $ D: num 0.635 0.0485 0.6281 0.4793 0.7 ...
# - attr(*, ".internal.selfref")=<externalptr>
From the help page at ?set
, this would avoid some of the [.data.table
overhead if that ever becomes a problem for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With