Let's say I have two data frames I want to bind:
ds_a <- data.frame(
x = 1:6,
y = 5,
z = "4",
l = 2,
stringsAsFactors = FALSE
)
ds_b <- data.frame(
x = as.factor(1:6),
y = "5",
p = 2,
stringsAsFactors = FALSE
)
When I try to bind them I get the following error:
> bind_rows(ds_a, ds_b)
Error: Can't combine `..1$x` <integer> and `..2$x` <factor<4c79c>>.
Typically what I do to solve this is I convert all the columns in both data frames to a character, bind the two data frames, and then manually re-convert all the columns back to their original type.
Is there a way to simply coerce all the type collisions between ds_a
and ds_b
by automatically casting ds_b
's columns to match ds_a
(assuming they're named the same)?
More generally, I'd like a solution to automatically convert all the columns in ds_b
to the type of ds_a
wherever the column names match. And the solution should work if ds_b
and ds_a
don't share all the same columns (just filling with NA when columns don't exist in one, but do in another).
Here's the intended outcome:
ds_merged =read.table(text = 'x y z l p
1 1 5 4 2 NA
2 2 5 4 2 NA
3 3 5 4 2 NA
4 4 5 4 2 NA
5 5 5 4 2 NA
6 6 5 4 2 NA
7 1 5 NA NA 2
8 2 5 NA NA 2
9 3 5 NA NA 2
10 4 5 NA NA 2
11 5 5 NA NA 2
12 6 5 NA NA 2', header = TRUE, row.names = NULL)
> ds_merged
row.names x y z l p
1 1 1 5 4 2 NA
2 2 2 5 4 2 NA
3 3 3 5 4 2 NA
4 4 4 5 4 2 NA
5 5 5 5 4 2 NA
6 6 6 5 4 2 NA
7 7 1 5 NA NA 2
8 8 2 5 NA NA 2
9 9 3 5 NA NA 2
10 10 4 5 NA NA 2
11 11 5 5 NA NA 2
12 12 6 5 NA NA 2
We can pass any Python, Numpy or Pandas datatype to change all columns of a dataframe to that type, or we can pass a dictionary having column names as keys and datatype as values to change type of selected columns.
As you can see, all columns in our data frame have the character class, even though the columns x2 and x3 contain integers and numerics. Let’s change that!
For object-dtyped columns, if infer_objects is True, use the inference rules as during normal Series/DataFrame construction. Then, if possible, convert to StringDtype, BooleanDtype or an appropriate integer or floating extension type, otherwise leave as object. If the dtype is integer, convert to an appropriate integer extension type.
As you can see based on Table 1, our example data is a data frame constructed of six rows and three columns. As you can see, all columns in our data frame have the character class, even though the columns x2 and x3 contain integers and numerics.
We could use type.convert()
Explanation: after comment of OP:
type_convert
does not consider ds_a
(you can check if you compare glimpse(ds_a)
with glimpse
of the resulting dataframe:
Note the columns of ds_a
have the same classes as in result
.
> # compare classes
> glimpse(ds_a)
Rows: 6
Columns: 4
$ x <int> 1, 2, 3, 4, 5, 6
$ y <dbl> 5, 5, 5, 5, 5, 5
$ z <chr> "4", "4", "4", "4", "4", "4"
$ l <dbl> 2, 2, 2, 2, 2, 2
> glimpse(ds_b)
Rows: 6
Columns: 3
$ x <fct> 1, 2, 3, 4, 5, 6
$ y <chr> "5", "5", "5", "5", "5", "5"
$ p <dbl> 2, 2, 2, 2, 2, 2
> glimpse(result)
Rows: 12
Columns: 5
$ x <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6
$ y <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
$ z <chr> "4", "4", "4", "4", "4", "4", NA, NA, NA, NA, NA, NA
$ l <dbl> 2, 2, 2, 2, 2, 2, NA, NA, NA, NA, NA, NA
$ p <int> NA, NA, NA, NA, NA, NA, 2, 2, 2, 2, 2, 2
What type.convert
does is:
ds_b
(notice the %>% is within bind_rows
). So all of ds_b$x
are integers therefore R converts class factor to class integer in ds_b$x.ds_b$y
are character class but integers in nature, therefore R converts character class to integer class. This may cause the misleading understanding. But, now we have ds_a$y
double class and ds_b$y
integer class -> but this is no problem for R and bind_rows
here double class overrides integer.> # showing what type.convert does to ds_b
> ds_b$x <- as.integer(ds_b$x)
> ds_b$y <- as.integer(ds_b$y)
> ds_b %>%
+ as_tibble()
# A tibble: 6 x 3
x y p
<int> <int> <dbl>
1 1 5 2
2 2 5 2
3 3 5 2
4 4 5 2
5 5 5 2
6 6 5 2
> ds_b %>%
+ as_tibble()
# A tibble: 6 x 3
x y p
<int> <int> <dbl>
1 1 5 2
2 2 5 2
3 3 5 2
4 4 5 2
5 5 5 2
6 6 5 2
> bind_rows(ds_a, ds_b) %>%
+ as_tibble()
# A tibble: 12 x 5
x y z l p
<int> <dbl> <chr> <dbl> <dbl>
1 1 5 4 2 NA
2 2 5 4 2 NA
3 3 5 4 2 NA
4 4 5 4 2 NA
5 5 5 4 2 NA
6 6 5 4 2 NA
7 1 5 NA NA 2
8 2 5 NA NA 2
9 3 5 NA NA 2
10 4 5 NA NA 2
11 5 5 NA NA 2
12 6 5 NA NA 2
ds_b$p
which is class double to class integer because the data are integer in nature.Solution:
library(dplyr)
bind_rows(ds_a, ds_b %>% type.convert(as.is=TRUE))
output:
x y z l p
1 1 5 4 2 NA
2 2 5 4 2 NA
3 3 5 4 2 NA
4 4 5 4 2 NA
5 5 5 4 2 NA
6 6 5 4 2 NA
7 1 5 <NA> NA 2
8 2 5 <NA> NA 2
9 3 5 <NA> NA 2
10 4 5 <NA> NA 2
11 5 5 <NA> NA 2
12 6 5 <NA> NA 2
You can change the class of one dataframe according to another one and row bind the datasets.
library(dplyr)
library(purrr)
bind_rows(ds_a, map2_df(ds_b, map(ds_a, class), ~{class(.x) <- .y;.x}))
# x y
#1 1 5
#2 2 5
#3 3 5
#4 4 5
#5 5 5
#6 6 5
#7 1 5
#8 2 5
#9 3 5
#10 4 5
#11 5 5
#12 6 5
map2_df
is used to changes the class of ds_b
data where
.x
- passes the column value of ds_b
.
.y
- map(ds_a, class)
gets the class
of each column in ds_a
In the function it changes class of .x
with .y
value and bind them. We then use bind_rows
with ds_a
dataframe.
If there are unequal number of columns you can change the classes of only common ones and bind the rows.
new_bind <- function(a, b) {
common_cols <- intersect(names(a), names(b))
b[common_cols] <- map2_df(b[common_cols],
map(a[common_cols], class), ~{class(.x) <- .y;.x})
bind_rows(a, b)
}
new_bind(ds_a, ds_b)
# x y z l p
#1 1 5 4 2 NA
#2 2 5 4 2 NA
#3 3 5 4 2 NA
#4 4 5 4 2 NA
#5 5 5 4 2 NA
#6 6 5 4 2 NA
#7 1 5 <NA> NA 2
#8 2 5 <NA> NA 2
#9 3 5 <NA> NA 2
#10 4 5 <NA> NA 2
#11 5 5 <NA> NA 2
#12 6 5 <NA> NA 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With