I write code that runs in high throughput production environments expected to handle a variety of input from multiple users. In this environment, I often need to join (with dplyr) two multi-column dataframes with join columns of mismatched types which produces this error. "Can't join on ... because of incompatible types"
The production code is expected to handle input csvs that are 150+ columns by 0-1000 rows, with 12-20 join columns. I use read.table for speed, and because the content of the 150 columns can vary, allow type to be inferred by data content.
It would be great to have a general-purpose function that matches join column types programmatically, making no assumptions about column name or location.
This is a frequently viewed question, so many others must run into the error, so deserves a more complete answer.
The simple solution for correcting this join error is to simply mutate the class of the column(s) causing the problem. This can be done as follows:
mutate the column class to match using as.numeric, as.logical or as.character. For example:
df2 <- df2 %>%
mutate(column1 = as.numeric(column1))
A solution for production environments is in the matchColClasses function shown, which does the following:
Reassign column classes in df2 to match df1
matchColClasses <- function(df1, df2) {
sharedColNames <- names(df1)[names(df1) %in% names(df2)]
sharedColTypes <- sapply(df1[,sharedColNames], class)
for (n in sharedColNames) {
class(df2[, n]) <- sharedColTypes[n]
}
return(df2)
}
This function works well in our production environment, with heterogenous data types; character, numeric and logical.
If the problem is that the classes of columns are not correctly detected after reading the data - you can specify the classes for each column at the data reading step.
So with read.csv()
function you would do it like this:
df1 <- read.csv("data.csv", colClasses=c("numeric", "Date", "character"))
df2 <- read.csv("data.csv", colClasses=c("numeric", "Date", "character"))
After this df1
and df2
will have columns of the same types.
The classes here are just an example, you will have to specify the ones that apply to your data.
Also take a look at help(read.table)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With