I want to append one data frame to another (the master one). The problem is that only subset of their columns are common. Also, the order of their columns might be different.
Master dataframe:
a b c
r1 1 2 -2
r2 2 4 -4
r3 3 6 -6
r4 4 8 -8
New dataframe:
d a c
r1 -120 10 -20
r2 -140 20 -40
Expected result:
a b c
r1 1 2 -2
r2 2 4 -4
r3 3 6 -6
r4 4 8 -8
r5 10 NaN -20
r6 20 NaN -40
Is there any smart way of doing this? This is a similar question but the setup is different.
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.
Method 1 : Using plyr package rbind. fill() method in R is an enhancement of the rbind() method in base R, is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.
Dataframe append syntax Using the append method on a dataframe is very simple. You type the name of the first dataframe, and then . append() to call the method. Then inside the parenthesis, you type the name of the second dataframe, which you want to append to the end of the first.
The above code throws an error that the column names must match. So, the column names in both the data frames must be the same if you want to use rbind().
Check out the bind_rows
function in the dplyr
package. It will do some nice things for you by default, such as filling in columns that exist in one data.frame
but not the other with NA
s instead of just failing. Here is an example:
# Use the dplyr package for binding rows and for selecting columns
library(dplyr)
# Generate some example data
a <- data.frame(a = rnorm(10), b = rnorm(10))
b <- data.frame(a = rnorm(5), c = rnorm(5))
# Stack data frames
bind_rows(a, b)
Source: local data frame [15 x 3]
a b c
1 2.2891895 0.1940835 NA
2 0.7620825 -0.2441634 NA
3 1.8289665 1.5280338 NA
4 -0.9851729 -0.7187585 NA
5 1.5829853 1.6609695 NA
6 0.9231296 1.8052112 NA
7 -0.5801230 -0.6928449 NA
8 0.2033514 -0.6673596 NA
9 -0.8576628 0.5163021 NA
10 0.6296633 -1.2445280 NA
11 2.1693068 NA -0.2556584
12 -0.1048966 NA -0.3132198
13 0.2673514 NA -1.1181995
14 1.0937759 NA -2.5750115
15 -0.8147180 NA -1.5525338
To solve the problem in your question, you would want to select for the columns in your master data.frame
first. If a
is the master data.frame
, and b
contains data that you want to add, you can use the select
function from dplyr
to get the columns that you need first.
# Select all columns in b with the same names as in master data, a
# Use select_() instead of select() to do standard evaluation.
b <- select_(b, names(a))
# Combine
bind_rows(a, b)
Source: local data frame [15 x 2]
a b
1 2.2891895 0.1940835
2 0.7620825 -0.2441634
3 1.8289665 1.5280338
4 -0.9851729 -0.7187585
5 1.5829853 1.6609695
6 0.9231296 1.8052112
7 -0.5801230 -0.6928449
8 0.2033514 -0.6673596
9 -0.8576628 0.5163021
10 0.6296633 -1.2445280
11 2.1693068 NA
12 -0.1048966 NA
13 0.2673514 NA
14 1.0937759 NA
15 -0.8147180 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With