Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't use dplyr joins on two date columns

Tags:

r

I'm encountering an error where I try to join two data frames using the dplyr join functions by two Date columns. This is the error I get:

Error: cannot join on columns 'DateInfo' x 'DateInfo': Can't join on 'DateInfo' x 'DateInfo' because of incompatible types (Date / Date)

The base merge function works fine and I can't seem to find an example of what could be causing this through googling or other stack overflow questions.

The problem is I can't create a reproducible example and the data I am using I can't share. For example this works with no problems:

d1 <- data_frame(Num = 1:5, DateInfo = as.Date(c("2014-01-03", "2014-04-05", "2015-01-03", "2014-04-02", "2011-07-28"), format = "%Y-%m-%d"))
d2 <- data_frame(Name = c("a", "b", "c", "d", "e"), DateInfo = as.Date(c("2014-01-03", "2014-04-05", "2015-01-03", "2014-04-02", "2011-07-28"), format = "%Y-%m-%d"))
d3 <- left_join(d1, d2, by = c("DateInfo" = "DateInfo"))

Has anyone had any experience with not being able to join on two columns that are, as far as the class function is concerned, are the same type but still getting this error?

EDIT: Just to get this out of the way I can get around this error by using merge or converting the dates to characters and then joining, so I'm really just interested in why dplyr would tell me I can't merge on two columns with the same type.

like image 658
Matt Mills Avatar asked Aug 18 '15 20:08

Matt Mills


People also ask

Can you join on two columns in R?

Using base merge() to Join Multiple ColumnsUsing merge() function from the R base can also be used to perform joining on multiple columns of data frame. To do so you need to create a vector for by. x with the columns you wanted to join on and create a similar vector for by. y .

How do I join tables in dplyr?

Joins with dplyr. The dplyr package uses SQL database syntax for its join functions. A left join means: Include everything on the left (what was the x data frame in merge() ) and all rows that match from the right (y) data frame. If the join columns have the same name, all you need is left_join(x, y) .

What is inner join dplyr?

dplyr package provides several functions to join data frames in R. In R, Inner join or natural join is the default join and it's mostly used joining data frames, it is used to join data. frames on a specified single or multiple columns, and where column values don't match the rows get dropped from both data.

What is full join in R?

Full join: The full outer join returns all of the records in a new table, whether it matches on either the left or right tables. If the table rows match, then a join will be executed, otherwise it will return NULL in places where a matching row does not exist.


2 Answers

The reason I can't merge is how the two Date objects are stored. Thanks to this issue I decided to check the structure of how the two objects are stored and sure enough one is stored as an integer and one is stored as a numeric:

> dput(df1$DateInfo[1])
structure(16373, class = "Date")
> dput(df2$DateInfo[1])
structure(16372L, class = "Date")

It appears that the data that was pulled form a DB through the dplyr sql functions is stored as a numeric while the data from a csv is stored as an integer. I don't know why that won't let dplyr join on them while merge can or why it happens in the first place but I think this specific question is answered.

like image 109
Matt Mills Avatar answered Oct 19 '22 12:10

Matt Mills


I just had this exact same issue. Two data frames, each with a POSIXct date_time column and the dplyr join functions (by = "date_time") would not work due to incompatible types. Thanks to Matt Mills, I used the dput function to investigate the POSIXct columns and found that, even though both were POSIXct, one came out numeric and the other was character.

I fixed this by going back to where I created my POSIXct object and used this code:

df_temp <- df_temp %>% 
mutate(date_time = as.numeric(date_time)) %>% 
mutate(date_time = as.POSIXct(date_time, tz = tz_in, origin = "1970-01-01 00:00:00"))

Its weird...its like the POSIXct format remembers its original type. My added code forced the date_time fields in both variables to be numeric before converting to POSIXct.

dplyr::inner_join now works. Thanks for this thread; saved my bacon. ;)

like image 40
Paul Avatar answered Oct 19 '22 14:10

Paul