I have several data frames that I want to combine by row. In the resulting single data frame, I want to create a new variable identifying which data set the observation came from.
# original data frames df1 <- data.frame(x = c(1, 3), y = c(2, 4)) df2 <- data.frame(x = c(5, 7), y = c(6, 8)) # desired, combined data frame df3 <- data.frame(x = c(1, 3, 5, 7), y = c(2, 4, 6, 8), source = c("df1", "df1", "df2", "df2") # x y source # 1 2 df1 # 3 4 df1 # 5 6 df2 # 7 8 df2
How can I achieve this? Thanks in advance!
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
0), rbind has the capacity to to join two data sets with the same name columns even if they are in different order.
To join more than two (multiple) R dataframes, then reduce() is used. It is available in the tidyverse package which will convert all the dataframes to a list and join the dataframes based on the column.
It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)
> do.call(rbind, list(df1 = df1, df2 = df2)) x y df1.1 1 2 df1.2 3 4 df2.1 5 6 df2.2 7 8
Notice that the row names now reflect the source data.frame
s.
cbind
and rbind
Another option is to make a basic function like the following:
AppendMe <- function(dfNames) { do.call(rbind, lapply(dfNames, function(x) { cbind(get(x), source = x) })) }
This function then takes a character vector of the data.frame
names that you want to "stack", as follows:
> AppendMe(c("df1", "df2")) x y source 1 1 2 df1 2 3 4 df1 3 5 6 df2 4 7 8 df2
combine
from the "gdata" package> library(gdata) > combine(df1, df2) x y source 1 1 2 df1 2 3 4 df1 3 5 6 df2 4 7 8 df2
rbindlist
from "data.table"Another approach that can be used now is to use rbindlist
from "data.table" and its idcol
argument. With that, the approach could be:
> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE) .id x y 1: df1 1 2 2: df1 3 4 3: df2 5 6 4: df2 7 8
map_df
from "purrr"Similar to rbindlist
, you can also use map_df
from "purrr" with I
or c
as the function to apply to each list element.
> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src") Source: local data frame [4 x 3] src x y (chr) (int) (int) 1 df1 1 2 2 df1 3 4 3 df2 5 6 4 df2 7 8
Another approach using dplyr
:
df1 <- data.frame(x = c(1,3), y = c(2,4)) df2 <- data.frame(x = c(5,7), y = c(6,8)) df3 <- dplyr::bind_rows(list(df1=df1, df2=df2), .id = 'source') df3 Source: local data frame [4 x 3] source x y (chr) (dbl) (dbl) 1 df1 1 2 2 df1 3 4 3 df2 5 6 4 df2 7 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With