When using the join function in the dplyr
package, I get this warning:
Warning message: In left_join_impl(x, y, by$x, by$y) : joining factors with different levels, coercing to character vector
There is not a lot of information online about this. Any idea what it could be? Thanks!
Using merge() to Join Different Column Names Using merge() function from the R base can also be used to perform joining on different column names. To do so you need to create a vector for by. x with the columns you wanted to join on and create a similar vector for by. y .
A left join is used to join the table by selecting all the records from the first dataframe and only matching records in the second dataframe.
full_join() return all rows and all columns from both x and y . Where there are not matching values, returns NA for the one missing. return all rows from x where there are matching values in y , keeping just columns from x .
Joins with dplyr. The dplyr package uses SQL database syntax for its join functions. A left join means: Include everything on the left (what was the x data frame in merge() ) and all rows that match from the right (y) data frame. If the join columns have the same name, all you need is left_join(x, y) .
That's not an error, that's a warning. And it's telling you that one of the columns you used in your join was a factor and that factor had different levels in the different datasets. In order not to lose any information, the factors were converted to character values. For example:
library(dplyr) x<-data.frame(a=letters[1:7]) y<-data.frame(a=letters[4:10]) class(x$a) # [1] "factor" # NOTE these are different levels(x$a) # [1] "a" "b" "c" "d" "e" "f" "g" levels(y$a) # [1] "d" "e" "f" "g" "h" "i" "j" m <- left_join(x,y) # Joining by: "a" # Warning message: # joining factors with different levels, coercing to character vector class(m$a) # [1] "character"
You can make sure that both factors have the same levels before merging
combined <- sort(union(levels(x$a), levels(y$a))) n <- left_join(mutate(x, a=factor(a, levels=combined)), mutate(y, a=factor(a, levels=combined))) # Joining by: "a" class(n$a) #[1] "factor"
This warning message will also appear if the joining columns in the two tables have different level orders;
tb1 <- data_frame(a = c("a","b","c")) %>% mutate(a=as.factor(a)) # Change level order of table tb2's col a tb2 <- tb1 %>% mutate(a = fct_relevel(a,"c")) # Check both still factors tb1$a %>% class() [1] "factor" tb2$a %>% class() [1] "factor" # Check level order tb1$a %>% levels() [1] "a" "b" "c" tb2$a %>% levels() [1] "c" "a" "b" # Try joining tb1 %>% left_join(tb2) Joining, by = "a" Column `a` joining factors with different levels, coercing to character vector
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With