Let's say I have two data frames I want to bind: <pre class="prettyprint"><code>ds_a <- data.frame( x = 1:6, y = 5, z = "4", l = 2, stringsAsFactors = FALSE ) ds_b <- data.frame( x = as.factor(1:6), y = "5", p = 2, stringsAsFactors = FALSE ) </code></pre> When I try to bind them I get the following error: <pre class="prettyprint"><code>> bind_rows(ds_a, ds_b) Error: Can't combine `..1$x` <integer> and `..2$x` <factor<4c79c>>. </code></pre> Typically what I do to solve this is I convert all the columns in both data frames to a character, bind the two data frames, and then manually re-convert all the columns back to their original type. Is there a way to simply coerce all the type collisions between <code>ds_a</code> and <code>ds_b</code> by automatically casting <code>ds_b</code>'s columns to match <code>ds_a</code> (assuming they're named the same)? More generally, I'd like a solution to automatically convert all the columns in <code>ds_b</code> to the type of <code>ds_a</code> wherever the column names match. And the solution should work if <code>ds_b</code> and <code>ds_a</code> don't share all the same columns (just filling with NA when columns don't exist in one, but do in another). Here's the intended outcome: <pre class="prettyprint"><code>ds_merged =read.table(text = 'x y z l p 1 1 5 4 2 NA 2 2 5 4 2 NA 3 3 5 4 2 NA 4 4 5 4 2 NA 5 5 5 4 2 NA 6 6 5 4 2 NA 7 1 5 NA NA 2 8 2 5 NA NA 2 9 3 5 NA NA 2 10 4 5 NA NA 2 11 5 5 NA NA 2 12 6 5 NA NA 2', header = TRUE, row.names = NULL) > ds_merged row.names x y z l p 1 1 1 5 4 2 NA 2 2 2 5 4 2 NA 3 3 3 5 4 2 NA 4 4 4 5 4 2 NA 5 5 5 5 4 2 NA 6 6 6 5 4 2 NA 7 7 1 5 NA NA 2 8 8 2 5 NA NA 2 9 9 3 5 NA NA 2 10 10 4 5 NA NA 2 11 11 5 5 NA NA 2 12 12 6 5 NA NA 2 </code></pre>

We could use <code>type.convert()</code> Explanation: after comment of OP: <code>type_convert</code> does not consider <code>ds_a</code> (you can check if you compare <code>glimpse(ds_a)</code> with <code>glimpse</code> of the resulting dataframe: Note the columns of <code>ds_a</code> have the same classes as in <code>result</code>. <pre class="prettyprint"><code>> # compare classes > glimpse(ds_a) Rows: 6 Columns: 4 $ x <int> 1, 2, 3, 4, 5, 6 $ y <dbl> 5, 5, 5, 5, 5, 5 $ z <chr> "4", "4", "4", "4", "4", "4" $ l <dbl> 2, 2, 2, 2, 2, 2 > glimpse(ds_b) Rows: 6 Columns: 3 $ x <fct> 1, 2, 3, 4, 5, 6 $ y <chr> "5", "5", "5", "5", "5", "5" $ p <dbl> 2, 2, 2, 2, 2, 2 > glimpse(result) Rows: 12 Columns: 5 $ x <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6 $ y <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5 $ z <chr> "4", "4", "4", "4", "4", "4", NA, NA, NA, NA, NA, NA $ l <dbl> 2, 2, 2, 2, 2, 2, NA, NA, NA, NA, NA, NA $ p <int> NA, NA, NA, NA, NA, NA, 2, 2, 2, 2, 2, 2 </code></pre> What <code>type.convert</code> does is: <ol> <li>to apply the best fitting class to the data of <code>ds_b</code> (notice the %>% is within <code>bind_rows</code>). So all of <code>ds_b$x</code> are integers therefore R converts class factor to class integer in ds_b$x.</li> <li>All of <code>ds_b$y</code> are character class but integers in nature, therefore R converts character class to integer class. This may cause the misleading understanding. But, now we have <code>ds_a$y</code> double class and <code>ds_b$y</code> integer class -> but this is no problem for R and <code>bind_rows</code> here double class overrides integer.</li> </ol> <pre class="prettyprint"><code>> # showing what type.convert does to ds_b > ds_b$x <- as.integer(ds_b$x) > ds_b$y <- as.integer(ds_b$y) > ds_b %>% + as_tibble() # A tibble: 6 x 3 x y p <int> <int> <dbl> 1 1 5 2 2 2 5 2 3 3 5 2 4 4 5 2 5 5 5 2 6 6 5 2 > ds_b %>% + as_tibble() # A tibble: 6 x 3 x y p <int> <int> <dbl> 1 1 5 2 2 2 5 2 3 3 5 2 4 4 5 2 5 5 5 2 6 6 5 2 > bind_rows(ds_a, ds_b) %>% + as_tibble() # A tibble: 12 x 5 x y z l p <int> <dbl> <chr> <dbl> <dbl> 1 1 5 4 2 NA 2 2 5 4 2 NA 3 3 5 4 2 NA 4 4 5 4 2 NA 5 5 5 4 2 NA 6 6 5 4 2 NA 7 1 5 NA NA 2 8 2 5 NA NA 2 9 3 5 NA NA 2 10 4 5 NA NA 2 11 5 5 NA NA 2 12 6 5 NA NA 2 </code></pre> <ol start="3"> <li>converts <code>ds_b$p</code> which is class double to class integer because the data are integer in nature.</li> </ol> Solution: <pre class="prettyprint"><code>library(dplyr) bind_rows(ds_a, ds_b %>% type.convert(as.is=TRUE)) </code></pre> output: <pre class="prettyprint"><code> x y z l p 1 1 5 4 2 NA 2 2 5 4 2 NA 3 3 5 4 2 NA 4 4 5 4 2 NA 5 5 5 4 2 NA 6 6 5 4 2 NA 7 1 5 <NA> NA 2 8 2 5 <NA> NA 2 9 3 5 <NA> NA 2 10 4 5 <NA> NA 2 11 5 5 <NA> NA 2 12 6 5 <NA> NA 2 </code></pre>

You can change the class of one dataframe according to another one and row bind the datasets. <pre class="prettyprint"><code>library(dplyr) library(purrr) bind_rows(ds_a, map2_df(ds_b, map(ds_a, class), ~{class(.x) <- .y;.x})) # x y #1 1 5 #2 2 5 #3 3 5 #4 4 5 #5 5 5 #6 6 5 #7 1 5 #8 2 5 #9 3 5 #10 4 5 #11 5 5 #12 6 5 </code></pre> <code>map2_df</code> is used to changes the class of <code>ds_b</code> data where <code>.x</code> - passes the column value of <code>ds_b</code>. <code>.y</code> - <code>map(ds_a, class)</code> gets the <code>class</code> of each column in <code>ds_a</code> In the function it changes class of <code>.x</code> with <code>.y</code> value and bind them. We then use <code>bind_rows</code> with <code>ds_a</code> dataframe. <hr> If there are unequal number of columns you can change the classes of only common ones and bind the rows. <pre class="prettyprint"><code>new_bind <- function(a, b) { common_cols <- intersect(names(a), names(b)) b[common_cols] <- map2_df(b[common_cols], map(a[common_cols], class), ~{class(.x) <- .y;.x}) bind_rows(a, b) } new_bind(ds_a, ds_b) # x y z l p #1 1 5 4 2 NA #2 2 5 4 2 NA #3 3 5 4 2 NA #4 4 5 4 2 NA #5 5 5 4 2 NA #6 6 5 4 2 NA #7 1 5 <NA> NA 2 #8 2 5 <NA> NA 2 #9 3 5 <NA> NA 2 #10 4 5 <NA> NA 2 #11 5 5 <NA> NA 2 #12 6 5 <NA> NA 2 </code></pre>

Automatically coerce all column types of one data frame to the type of another prior to binding

Tags:

r

dplyr

Let's say I have two data frames I want to bind:

ds_a <- data.frame(
  x = 1:6,
  y = 5,
  z = "4",
  l = 2,
  stringsAsFactors = FALSE
)

ds_b <- data.frame(
  x = as.factor(1:6),
  y = "5",
  p = 2,
  stringsAsFactors = FALSE
)

When I try to bind them I get the following error:

> bind_rows(ds_a, ds_b)
Error: Can't combine `..1$x` <integer> and `..2$x` <factor<4c79c>>.

Typically what I do to solve this is I convert all the columns in both data frames to a character, bind the two data frames, and then manually re-convert all the columns back to their original type.

Is there a way to simply coerce all the type collisions between ds_a and ds_b by automatically casting ds_b's columns to match ds_a (assuming they're named the same)?

More generally, I'd like a solution to automatically convert all the columns in ds_b to the type of ds_a wherever the column names match. And the solution should work if ds_b and ds_a don't share all the same columns (just filling with NA when columns don't exist in one, but do in another).

Here's the intended outcome:

ds_merged =read.table(text = 'x y z l p 
1 1 5 4 2 NA
2 2 5 4 2 NA
3 3 5 4 2 NA
4 4 5 4 2 NA
5 5 5 4 2 NA
6 6 5 4 2 NA
7 1 5 NA NA 2
8 2 5 NA NA 2
9 3 5 NA NA 2
10 4 5 NA NA 2
11 5 5 NA NA 2
12 6 5 NA NA 2', header = TRUE, row.names = NULL)

> ds_merged

   row.names x y  z  l  p
1          1 1 5  4  2 NA
2          2 2 5  4  2 NA
3          3 3 5  4  2 NA
4          4 4 5  4  2 NA
5          5 5 5  4  2 NA
6          6 6 5  4  2 NA
7          7 1 5 NA NA  2
8          8 2 5 NA NA  2
9          9 3 5 NA NA  2
10        10 4 5 NA NA  2
11        11 5 5 NA NA  2
12        12 6 5 NA NA  2

403

asked Oct 25 '21 19:10

Parseltongue

2 Answers

We could use type.convert()

Explanation: after comment of OP:

type_convert does not consider ds_a (you can check if you compare glimpse(ds_a) with glimpse of the resulting dataframe:

Note the columns of ds_a have the same classes as in result.

> # compare classes
> glimpse(ds_a)
Rows: 6
Columns: 4
$ x <int> 1, 2, 3, 4, 5, 6
$ y <dbl> 5, 5, 5, 5, 5, 5
$ z <chr> "4", "4", "4", "4", "4", "4"
$ l <dbl> 2, 2, 2, 2, 2, 2
> glimpse(ds_b)
Rows: 6
Columns: 3
$ x <fct> 1, 2, 3, 4, 5, 6
$ y <chr> "5", "5", "5", "5", "5", "5"
$ p <dbl> 2, 2, 2, 2, 2, 2
> glimpse(result)
Rows: 12
Columns: 5
$ x <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6
$ y <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
$ z <chr> "4", "4", "4", "4", "4", "4", NA, NA, NA, NA, NA, NA
$ l <dbl> 2, 2, 2, 2, 2, 2, NA, NA, NA, NA, NA, NA
$ p <int> NA, NA, NA, NA, NA, NA, 2, 2, 2, 2, 2, 2

What type.convert does is:

to apply the best fitting class to the data of ds_b (notice the %>% is within bind_rows). So all of ds_b$x are integers therefore R converts class factor to class integer in ds_b$x.
All of ds_b$y are character class but integers in nature, therefore R converts character class to integer class. This may cause the misleading understanding. But, now we have ds_a$y double class and ds_b$y integer class -> but this is no problem for R and bind_rows here double class overrides integer.

> # showing what type.convert does to ds_b
> ds_b$x <- as.integer(ds_b$x)
> ds_b$y <- as.integer(ds_b$y)
> ds_b %>% 
+   as_tibble()
# A tibble: 6 x 3
      x     y     p
  <int> <int> <dbl>
1     1     5     2
2     2     5     2
3     3     5     2
4     4     5     2
5     5     5     2
6     6     5     2
> ds_b %>% 
+   as_tibble()
# A tibble: 6 x 3
      x     y     p
  <int> <int> <dbl>
1     1     5     2
2     2     5     2
3     3     5     2
4     4     5     2
5     5     5     2
6     6     5     2
> bind_rows(ds_a, ds_b) %>% 
+   as_tibble()
# A tibble: 12 x 5
       x     y z         l     p
   <int> <dbl> <chr> <dbl> <dbl>
 1     1     5 4         2    NA
 2     2     5 4         2    NA
 3     3     5 4         2    NA
 4     4     5 4         2    NA
 5     5     5 4         2    NA
 6     6     5 4         2    NA
 7     1     5 NA       NA     2
 8     2     5 NA       NA     2
 9     3     5 NA       NA     2
10     4     5 NA       NA     2
11     5     5 NA       NA     2
12     6     5 NA       NA     2

converts ds_b$p which is class double to class integer because the data are integer in nature.

Solution:

library(dplyr)
bind_rows(ds_a, ds_b %>% type.convert(as.is=TRUE))

output:

   x y    z  l  p
1  1 5    4  2 NA
2  2 5    4  2 NA
3  3 5    4  2 NA
4  4 5    4  2 NA
5  5 5    4  2 NA
6  6 5    4  2 NA
7  1 5 <NA> NA  2
8  2 5 <NA> NA  2
9  3 5 <NA> NA  2
10 4 5 <NA> NA  2
11 5 5 <NA> NA  2
12 6 5 <NA> NA  2

114

answered Oct 19 '22 23:10

TarJae

You can change the class of one dataframe according to another one and row bind the datasets.

library(dplyr)
library(purrr)

bind_rows(ds_a, map2_df(ds_b, map(ds_a, class), ~{class(.x) <- .y;.x}))

#   x y
#1  1 5
#2  2 5
#3  3 5
#4  4 5
#5  5 5
#6  6 5
#7  1 5
#8  2 5
#9  3 5
#10 4 5
#11 5 5
#12 6 5

map2_df is used to changes the class of ds_b data where

.x - passes the column value of ds_b.

.y - map(ds_a, class) gets the class of each column in ds_a

In the function it changes class of .x with .y value and bind them. We then use bind_rows with ds_a dataframe.

If there are unequal number of columns you can change the classes of only common ones and bind the rows.

new_bind <- function(a, b) {
  common_cols <- intersect(names(a), names(b))
  b[common_cols] <- map2_df(b[common_cols], 
               map(a[common_cols], class), ~{class(.x) <- .y;.x})
  bind_rows(a, b)  
}
new_bind(ds_a, ds_b) 

#   x y    z  l  p
#1  1 5    4  2 NA
#2  2 5    4  2 NA
#3  3 5    4  2 NA
#4  4 5    4  2 NA
#5  5 5    4  2 NA
#6  6 5    4  2 NA
#7  1 5 <NA> NA  2
#8  2 5 <NA> NA  2
#9  3 5 <NA> NA  2
#10 4 5 <NA> NA  2
#11 5 5 <NA> NA  2
#12 6 5 <NA> NA  2

answered Oct 19 '22 22:10

Ronak Shah

Related questions
                            
                                R: How to : 3d Density plot with gplot and geom_density
                            
                                Seasonal Decomposition of Time Series by Loess with Python
                            
                                R make circle/chord diagram with circlize from dataframe
                            
                                How can I add notes to the bottom of a table using knitr::kable?
                            
                                dplyr summarize with a function of a dataframe
                            
                                How do I change the background color on a shiny dashboard in R
                            
                                How to join data frames based on condition between 2 columns
                            
                                R shiny custom icon/image in selectInput
                            
                                R: argument matches multiple formal arguments
                            
                                How to get the name of each element of a list using lapply()?
                            
                                Removing the levels attribute in the output - R
                            
                                filtering with multiple conditions on many columns using dplyr
                            
                                Heatmap plot by value using ggmap
                            
                                How to do range grouping on a column using dplyr?
                            
                                Error in sending email through Gmail by using mailR
                            
                                tidyr use separate_rows over multiple columns
                            
                                What is difference between eval_metric and feval in xgboost?
                            
                                Google Analytics does not work with blogdown
                            
                                Add group mean line to barplot with ggplot2
                            
                                How do we configure shinyserver open source to support concurrent users

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With