Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Append a data frame to a master data frame if some columns are common [duplicate]

I want to append one data frame to another (the master one). The problem is that only subset of their columns are common. Also, the order of their columns might be different.

Master dataframe:

   a b  c
r1 1 2 -2
r2 2 4 -4
r3 3 6 -6
r4 4 8 -8

New dataframe:

      d  a   c
r1 -120 10 -20
r2 -140 20 -40

Expected result:

    a   b    c
r1  1   2   -2
r2  2   4   -4
r3  3   6   -6
r4  4   8   -8
r5 10 NaN  -20
r6 20 NaN  -40

Is there any smart way of doing this? This is a similar question but the setup is different.

like image 751
Szilard Avatar asked Dec 14 '15 21:12

Szilard


People also ask

How do I append a DataFrame to another data frame?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.

How do I append Dataframes with different columns in R?

Method 1 : Using plyr package rbind. fill() method in R is an enhancement of the rbind() method in base R, is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.

How do you append a data frame?

Dataframe append syntax Using the append method on a dataframe is very simple. You type the name of the first dataframe, and then . append() to call the method. Then inside the parenthesis, you type the name of the second dataframe, which you want to append to the end of the first.

Does Rbind require same number of columns?

The above code throws an error that the column names must match. So, the column names in both the data frames must be the same if you want to use rbind().


1 Answers

Check out the bind_rows function in the dplyr package. It will do some nice things for you by default, such as filling in columns that exist in one data.frame but not the other with NAs instead of just failing. Here is an example:

# Use the dplyr package for binding rows and for selecting columns
library(dplyr)

# Generate some example data
a <- data.frame(a = rnorm(10), b = rnorm(10))
b <- data.frame(a = rnorm(5), c = rnorm(5))

# Stack data frames
bind_rows(a, b)

Source: local data frame [15 x 3]

            a          b          c
1   2.2891895  0.1940835         NA
2   0.7620825 -0.2441634         NA
3   1.8289665  1.5280338         NA
4  -0.9851729 -0.7187585         NA
5   1.5829853  1.6609695         NA
6   0.9231296  1.8052112         NA
7  -0.5801230 -0.6928449         NA
8   0.2033514 -0.6673596         NA
9  -0.8576628  0.5163021         NA
10  0.6296633 -1.2445280         NA
11  2.1693068         NA -0.2556584
12 -0.1048966         NA -0.3132198
13  0.2673514         NA -1.1181995
14  1.0937759         NA -2.5750115
15 -0.8147180         NA -1.5525338

To solve the problem in your question, you would want to select for the columns in your master data.frame first. If a is the master data.frame, and b contains data that you want to add, you can use the select function from dplyr to get the columns that you need first.

# Select all columns in b with the same names as in master data, a
# Use select_() instead of select() to do standard evaluation.
b <- select_(b, names(a))

# Combine
bind_rows(a, b)

Source: local data frame [15 x 2]

            a          b
1   2.2891895  0.1940835
2   0.7620825 -0.2441634
3   1.8289665  1.5280338
4  -0.9851729 -0.7187585
5   1.5829853  1.6609695
6   0.9231296  1.8052112
7  -0.5801230 -0.6928449
8   0.2033514 -0.6673596
9  -0.8576628  0.5163021
10  0.6296633 -1.2445280
11  2.1693068         NA
12 -0.1048966         NA
13  0.2673514         NA
14  1.0937759         NA
15 -0.8147180         NA
like image 108
ialm Avatar answered Sep 30 '22 02:09

ialm