I have several data frames that I want to combine by row. In the resulting single data frame, I want to create a new variable identifying which data set the observation came from.
# original data frames df1 <- data.frame(x = c(1, 3), y = c(2, 4)) df2 <- data.frame(x = c(5, 7), y = c(6, 8)) # desired, combined data frame df3 <- data.frame(x = c(1, 3, 5, 7), y = c(2, 4, 6, 8), source = c("df1", "df1", "df2", "df2") # x y source # 1 2 df1 # 3 4 df1 # 5 6 df2 # 7 8 df2 How can I achieve this? Thanks in advance!
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
0), rbind has the capacity to to join two data sets with the same name columns even if they are in different order.
To join more than two (multiple) R dataframes, then reduce() is used. It is available in the tidyverse package which will convert all the dataframes to a list and join the dataframes based on the column.
It's not exactly what you asked for, but it's pretty close. Put your objects in a named list and use do.call(rbind...)
> do.call(rbind, list(df1 = df1, df2 = df2)) x y df1.1 1 2 df1.2 3 4 df2.1 5 6 df2.2 7 8 Notice that the row names now reflect the source data.frames.
cbind and rbind Another option is to make a basic function like the following:
AppendMe <- function(dfNames) { do.call(rbind, lapply(dfNames, function(x) { cbind(get(x), source = x) })) } This function then takes a character vector of the data.frame names that you want to "stack", as follows:
> AppendMe(c("df1", "df2")) x y source 1 1 2 df1 2 3 4 df1 3 5 6 df2 4 7 8 df2 combine from the "gdata" package> library(gdata) > combine(df1, df2) x y source 1 1 2 df1 2 3 4 df1 3 5 6 df2 4 7 8 df2 rbindlist from "data.table"Another approach that can be used now is to use rbindlist from "data.table" and its idcol argument. With that, the approach could be:
> rbindlist(mget(ls(pattern = "df\\d+")), idcol = TRUE) .id x y 1: df1 1 2 2: df1 3 4 3: df2 5 6 4: df2 7 8 map_df from "purrr"Similar to rbindlist, you can also use map_df from "purrr" with I or c as the function to apply to each list element.
> mget(ls(pattern = "df\\d+")) %>% map_df(I, .id = "src") Source: local data frame [4 x 3] src x y (chr) (int) (int) 1 df1 1 2 2 df1 3 4 3 df2 5 6 4 df2 7 8
Another approach using dplyr:
df1 <- data.frame(x = c(1,3), y = c(2,4)) df2 <- data.frame(x = c(5,7), y = c(6,8)) df3 <- dplyr::bind_rows(list(df1=df1, df2=df2), .id = 'source') df3 Source: local data frame [4 x 3] source x y (chr) (dbl) (dbl) 1 df1 1 2 2 df1 3 4 3 df2 5 6 4 df2 7 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With