Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert columns of arbitrary class to the class of matching columns in another data.table

Question:

I'm working in R. I want the shared columns of 2 data.tables (shared meaning same column name) to have matching classes. I'm struggling with a way to generically convert an object of unknown class to the unknown class of another object.


More context:

I know how to set the class of a column in a data.table, and I know about the as function. Also, this question isn't entirely data.table specific, but it comes up often when I use data.tables. Further, assume that the desired coercion is possible.

I have 2 data.tables. They share some column names, and those columns are intended to represent the same information. For the column names shared by table A and table B, I want the classes of A to match those in B (or other way around).


Example data.tables:

A <- structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L)), .Names = c("year", "stratum"), row.names = c(NA, -45L), class = c("data.table", "data.frame"))

B <- structure(list(year = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L), bt = c(-9.95187702337873, -9.48946944434626, -9.74178662514147, -5.36167545158338, -4.76405522202426, -5.41964239804882, -0.0807951335119085, 0.520481719699774, 0.0393874225863578, 5.40557402913123, 5.47927931969583, 5.37228402911139, 9.82774396910091, 9.89629694010177, 9.98105260936272, -9.82469892896284, -9.42530210357904, -9.66171049964775, -5.17540952901709, -4.81859082470115, -5.3577146169737, -0.0685310909609001, 0.441383303157166, -0.0105897444321987, 5.24205882775199, 5.65773605162835, 5.40217185632441, 9.90299445851434, 9.78883672575814, 9.98747998379124, -9.69843398105195, -9.31530717395811, -9.77406601252698, -4.83080164375344, -4.89056304189872, -5.3904000267275, -0.121508487954861, 0.493798577602088, -0.118550709142654, 5.23654772583187, 5.87760447006892, 5.22478092346285, 9.90949768116403, 9.85433376398086, 9.91619307289277), yr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("year", "stratum", "bt", "yr"), row.names = c(NA, -45L), class = c("data.table", "data.frame"), sorted = c("year", "stratum"))

Here's what they look like:

> A  
    year stratum
 1:    1       1
 2:    1       2
 3:    1       3
 4:    1       4

> B
    year stratum          bt yr
 1:    1       1 -9.95187702  1
 2:    1       2 -9.48946944  1
 3:    1       3 -9.74178663  1
 4:    1       4 -5.36167545  1

Here are the classes:

> sapply(A, class)
     year   stratum 
"integer" "integer"

> sapply(B, class)
     year   stratum        bt        yr 
"numeric" "integer" "numeric" "numeric"

Manually, I can accomplish the desired task through the following:

A[,year:=as.numeric(year)]

This is easy when there's only 1 column to change, you know that column ahead of time, and you know the desired class ahead of time. If desired, it's also pretty easy to to convert arbitrary columns to a given class. I also know how to convert arbitrary columns to any given class.


My Failed Attempt:

(EDIT: This actually works; see my answer)

s2c <- function (x, type = "list") 
{
    as.call(lapply(c(type, x), as.symbol))
}

# In this case, I can assume all columns of A can be found in B
# I am also able to assume that the desired conversion is possible
B.class <- sapply(B[,eval(s2c(names(A)))], class) 
for(col in names(A)){
    set(A, j=col, value=as(A[[col]], B.class[col]))
}

But this still returns the year column as "integer", not "numeric":

> sapply(A, class)
     year   stratum 
"integer" "integer" 

The problem in the above example is that class(as(1L, "numeric")) still returns "integer". On the other hand, class(as.numeric(1L)) returns "numeric"; however, I don't know ahead of time that need as.numeric is needed.


Question, Restated:

How do I make the column classes match, when neither columns nor the to/from classes are known ahead of time?


Additional Thoughts:

In a way, the question is mostly about arbitrary class matching. I run into this issue often with data.table because it's very vocal about class matching. E.g., I run into similar problems when needed to insert NA of the appropriate type (NA_real_ vs NA_character_, etc), depending on the class of the column (see related question/ issue in This Question).

Again, this question can be seen as a general issue of converting between arbitrary classes that aren't known in advance. In the past, I've written functions using switch to do something like switch(class(x), double = as.numeric(...), character = as.character(...), ..., but that seems a big ugly. The only reason I'm bringing this up in the context of data.table is because it's where I most often encounter the need for this type of functionality.

like image 914
rbatt Avatar asked Dec 04 '15 15:12

rbatt


1 Answers

Not very elegant but you may 'build' the as.* call like this:

for (x in colnames(A)) { A[,x] <- eval( call( paste0("as.", class(B[,x])), A[,x]) )}
like image 151
Tensibai Avatar answered Sep 20 '22 18:09

Tensibai