Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr join warning: joining factors with different levels

Tags:

r

When using the join function in the dplyr package, I get this warning:

Warning message: In left_join_impl(x, y, by$x, by$y) :   joining factors with different levels, coercing to character vector 

There is not a lot of information online about this. Any idea what it could be? Thanks!

like image 920
Christopher Yee Avatar asked May 26 '15 20:05

Christopher Yee


People also ask

How left join in R with different column names?

Using merge() to Join Different Column Names Using merge() function from the R base can also be used to perform joining on different column names. To do so you need to create a vector for by. x with the columns you wanted to join on and create a similar vector for by. y .

What does Left_join do in R?

A left join is used to join the table by selecting all the records from the first dataframe and only matching records in the second dataframe.

How does Full_join work in R?

full_join() return all rows and all columns from both x and y . Where there are not matching values, returns NA for the one missing. return all rows from x where there are matching values in y , keeping just columns from x .

How do I join a Dataframe in dplyr in R?

Joins with dplyr. The dplyr package uses SQL database syntax for its join functions. A left join means: Include everything on the left (what was the x data frame in merge() ) and all rows that match from the right (y) data frame. If the join columns have the same name, all you need is left_join(x, y) .


2 Answers

That's not an error, that's a warning. And it's telling you that one of the columns you used in your join was a factor and that factor had different levels in the different datasets. In order not to lose any information, the factors were converted to character values. For example:

library(dplyr) x<-data.frame(a=letters[1:7]) y<-data.frame(a=letters[4:10])  class(x$a)  # [1] "factor"  # NOTE these are different levels(x$a) # [1] "a" "b" "c" "d" "e" "f" "g" levels(y$a) # [1] "d" "e" "f" "g" "h" "i" "j"  m <- left_join(x,y) # Joining by: "a" # Warning message: # joining factors with different levels, coercing to character vector   class(m$a) # [1] "character" 

You can make sure that both factors have the same levels before merging

combined <- sort(union(levels(x$a), levels(y$a))) n <- left_join(mutate(x, a=factor(a, levels=combined)),     mutate(y, a=factor(a, levels=combined))) # Joining by: "a" class(n$a) #[1] "factor" 
like image 70
MrFlick Avatar answered Sep 28 '22 03:09

MrFlick


This warning message will also appear if the joining columns in the two tables have different level orders;

tb1 <- data_frame(a = c("a","b","c")) %>% mutate(a=as.factor(a)) # Change level order of table tb2's col a tb2 <- tb1 %>% mutate(a = fct_relevel(a,"c"))  # Check both still factors tb1$a %>% class() [1] "factor" tb2$a %>% class() [1] "factor"  # Check level order tb1$a %>% levels() [1] "a" "b" "c" tb2$a %>% levels() [1] "c" "a" "b"  # Try joining tb1 %>% left_join(tb2) Joining, by = "a" Column `a` joining factors with different levels, coercing to character vector 
like image 25
Jiaxiang Avatar answered Sep 28 '22 02:09

Jiaxiang