Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine (rbind) data frames and create column with name of original data frames

Tags:

r

I have several data frames that I want to combine by row. In the resulting single data frame, I want to create a new variable identifying which data set the observation came from.

# original data frames df1 <- data.frame(x = c(1, 3), y = c(2, 4)) df2 <- data.frame(x = c(5, 7), y = c(6, 8))  # desired, combined data frame df3  <- data.frame(x = c(1, 3, 5, 7), y = c(2, 4, 6, 8),                    source = c("df1", "df1", "df2", "df2") # x y source # 1 2    df1 # 3 4    df1 # 5 6    df2 # 7 8    df2 

How can I achieve this? Thanks in advance!

like image 405
maloneypatr Avatar asked Mar 01 '13 16:03

maloneypatr


People also ask

How do I combine two data frames in R based on column?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

How do you Rbind two data frames?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

Does Rbind work if columns are in different order?

0), rbind has the capacity to to join two data sets with the same name columns even if they are in different order.

How do I join multiple data frames in R?

To join more than two (multiple) R dataframes, then reduce() is used. It is available in the tidyverse package which will convert all the dataframes to a list and join the dataframes based on the column.


2 Answers

It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)

> do.call(rbind, list(df1 = df1, df2 = df2))       x y df1.1 1 2 df1.2 3 4 df2.1 5 6 df2.2 7 8 

Notice that the row names now reflect the source data.frames.

Update: Use cbind and rbind

Another option is to make a basic function like the following:

AppendMe <- function(dfNames) {   do.call(rbind, lapply(dfNames, function(x) {     cbind(get(x), source = x)   })) } 

This function then takes a character vector of the data.frame names that you want to "stack", as follows:

> AppendMe(c("df1", "df2"))   x y source 1 1 2    df1 2 3 4    df1 3 5 6    df2 4 7 8    df2 

Update 2: Use combine from the "gdata" package

> library(gdata) > combine(df1, df2)   x y source 1 1 2    df1 2 3 4    df1 3 5 6    df2 4 7 8    df2 

Update 3: Use rbindlist from "data.table"

Another approach that can be used now is to use rbindlist from "data.table" and its idcol argument. With that, the approach could be:

> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE)    .id x y 1: df1 1 2 2: df1 3 4 3: df2 5 6 4: df2 7 8 

Update 4: use map_df from "purrr"

Similar to rbindlist, you can also use map_df from "purrr" with I or c as the function to apply to each list element.

> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src") Source: local data frame [4 x 3]      src     x     y   (chr) (int) (int) 1   df1     1     2 2   df1     3     4 3   df2     5     6 4   df2     7     8 
like image 126
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 29 '22 22:09

A5C1D2H2I1M1N2O1R2T1


Another approach using dplyr:

df1 <- data.frame(x = c(1,3), y = c(2,4)) df2 <- data.frame(x = c(5,7), y = c(6,8))  df3 <- dplyr::bind_rows(list(df1=df1, df2=df2), .id = 'source')  df3 Source: local data frame [4 x 3]    source     x     y    (chr) (dbl) (dbl) 1    df1     1     2 2    df1     3     4 3    df2     5     6 4    df2     7     8 
like image 31
chriad Avatar answered Sep 29 '22 22:09

chriad