Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bind_rows in dplyr throwing unusual error

Tags:

Hopefully I'm not duplicating some previously existing issue. I'm working on a 32-bit Win7 machine, R V=3.2.0, dplyr V=0.4.1, RStudio 0.98.1103.

The files in question are two CSV files read into vars (x,y / sep = "|", header = TRUE, stringsasFactors = FALSE), that originated from the same Oracle table. The query used to produce both files pulled the exact same variables (29 of).

identical(names(x), names(y) > TRUE

However, when I load the dplyr package and attempt to use 'bind_rows" as dat <- bind_rows(x, y) I get the following error:

> bind_rows(x,y)
Error: incompatible type (data index: 2, column: 'rmnumber', was collecting: integer (dplyr::Collecter_Impl<13>), incompatible with data of type: factor
In addition: Warning messages:
1: In rbind_all(list(x, ...)) :
  Unequal factor levels: coercing to character
2: In rbind_all(list(x, ...)) :
  Unequal factor levels: coercing to character
3: In rbind_all(list(x, ...)) :
  Unequal factor levels: coercing to character

I looked at the column 'rmnumber' and verified that everything in that column is either a numeric as expected or "NA", also as expected for NULL values in the table. I also tried bind_rows(list(x,y)) and it returned the same error.

The primitive "rbind" works just fine on these variables with no noticeable loss of precision.

Has anyone seen this error? Do you have any potential solutions outside of using rbind?

Thanks!

#

I don't think this is helpful but I constructed my own dfs and of course 'bind_rows' worked just perfectly:

> x.df <- data.frame(first_name = c("abc"), last_name = c("def"), rmnum = (1:15), addy = ("some_address"))
> y.df <- data.frame(first_name = c("abc"), last_name = c("def"), rmnum = (1:15), addy = ("some_address"))
> bind_rows(x.df, y.df)
Source: local data frame [30 x 4]

   first_name last_name rmnum         addy
1         abc       def     1 some_address
2         abc       def     2 some_address
3         abc       def     3 some_address
4         abc       def     4 some_address
5         abc       def     5 some_address
6         abc       def     6 some_address
7         abc       def     7 some_address
8         abc       def     8 some_address
9         abc       def     9 some_address
10        abc       def    10 some_address
..        ...       ...   ...          ...

Verifying class of cols

> identical(sapply(x, class), sapply(y, class))
[1] FALSE

> class(x$rmnumber);class(y$rmnumber)
[1] "integer"
[1] "character"

What I cannot figure out is why they are different. The information came out of the exact same table and they were read into variables using the exact same code.

Locking in the solution

Big thanks to @Pascal for helping me solve this. A simple data type conversion solved my issue:

    y$rmnumber <- as.integer(y$rmnumber)
> dat2 <- bind_rows(x,y)
> dat2
Source: local data frame [99,884 x 24]