I have two data frames that have some columns with the same names and others with different names. The data frames look something like this:
df1       ID hello world hockey soccer     1  1    NA    NA      7      4     2  2    NA    NA      2      5     3  3    10     8      8     23     4  4     4    17      5     12     5  5    NA    NA      3     43  df2           ID hello world football baseball     1  1     2     3       43        6     2  2     5     1       24       32     3  3    NA    NA        2       23     4  4    NA    NA        5       15     5  5     9     7       12       23 As you can see, in 2 of the shared columns ("hello" and "world"), some of the data is in one of the data frames and the rest is in the other.
What I am trying to do is (1) merge the 2 data frames by "id", (2) combine all the data from the "hello" and "world" columns in both frames into 1 "hello" column and 1 "world" column, and (3) have the final data frame also contain all of the other columns in the 2 original frames ("hockey", "soccer", "football", "baseball"). So, I want the final result to be this:
  ID hello world hockey soccer football baseball 1  1     2     3      7      4        43       6 2  2     5     3      2      5        24      32 3  3    10     8      8     23         2      23 4  4     4    17      5     12         5      15 5  5     9     7      3     43        12      23 I'm pretty new at R so the only codes I've tried are variations on merge and I've tried the answer I found here, which was based on a similar question: R: merging copies of the same variable. However, my data sets are actually much bigger than what I'm showing here (there's about 20 matching columns (like "hello" and "world") and 100s of non-matching ones (like "hockey" and "football")) so I'm looking for something that won't require me to write them all out manually.
Any idea if this can be done? I'm sorry I can't provide a sample of my efforts, but I really don't know where to start besides:
mydata <- merge(df1, df2, by=c("ID"), all = TRUE) To reproduce the data frames:
df1 <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L), hellow = c(2, 5, NA, NA, 9),         world = c(3, 1, NA, NA, 7), football = c(43, 24, 2, 5, 12),         baseball = c(6, 32, 23, 15, 23)), .Names = c("ID", "hello", "world",         "football", "baseball"), class = "data.frame", row.names = c(NA, -5L))   df2 <- structure(list(ID = c(1L, 2L, 3L, 4L, 5L), hellow = c(NA, NA, 10, 4, NA),         world = c(NA, NA, 8, 17, NA), hockey = c(7, 2, 8, 5, 3),         soccer = c(4, 5, 23, 12, 43)), .Names = c("ID", "hello", "world", "hockey",         "soccer"), class = "data.frame", row.names = c(NA, -5L)) If you do not specify the merge column (s) with on, then Pandas will use any columns with the same name as the merge keys. Before getting into the details of how to use merge (), you should first understand the various forms of joins:
Merge columns of data into one cell without losing data by Clipboard. To merge multiple columns into only one cell without losing any data in Excel, you can use the Clipboard to solve the problem easily. 1. First of all, enable the Clipboard by clicking the Anchor button at the bottom-right corner of Clipboard group on the Home tab.
It defaults to 'inner', but other possible options include 'outer', 'left', and 'right'. on: Use this to tell merge () which columns or indices (also called key columns or key indices) you want to join on.
1 After installing Kutools for Excel, please select the columns of data that you want to merge, and then click Kutools... 2 In the Combine Columns or Rows dialog box, specify the options as follows: More ...
Here's an approach that involves melting your data, merging the molten data, and using dcast to get it back to a wide form. I've added comments to help understand what is going on.
## Required packages library(data.table) library(reshape2)  dcast.data.table(   merge(     ## melt the first data.frame and set the key as ID and variable     setkey(melt(as.data.table(df1), id.vars = "ID"), ID, variable),      ## melt the second data.frame     melt(as.data.table(df2), id.vars = "ID"),      ## you'll have 2 value columns...     all = TRUE)[, value := ifelse(       ## ... combine them into 1 with ifelse       is.na(value.x), value.y, value.x)],    ## This is your reshaping formula   ID ~ variable, value.var = "value") #    ID hello world football baseball hockey soccer # 1:  1     2     3       43        6      7      4 # 2:  2     5     1       24       32      2      5 # 3:  3    10     8        2       23      8     23 # 4:  4     4    17        5       15      5     12 # 5:  5     9     7       12       23      3     43 Nobody's posted a dplyr solution, so here's a succinct option in dplyr. The approach is simply to do a full_join that combines all rows, then group and summarise to remove the redundant missing cells. 
library(tidyverse) df1 <- structure(list(ID = 1:5, hello = c(NA, NA, 10L, 4L, NA), world = c(NA, NA, 8L, 17L, NA), hockey = c(7L, 2L, 8L, 5L, 3L), soccer = c(4L, 5L, 23L, 12L, 43L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_integer", "collector")), hello = structure(list(), class = c("collector_integer", "collector")), world = structure(list(), class = c("collector_integer", "collector")), hockey = structure(list(), class = c("collector_integer", "collector")), soccer = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec")) df2 <- structure(list(ID = 1:5, hello = c(2L, 5L, NA, NA, 9L), world = c(3L, 1L, NA, NA, 7L), football = c(43L, 24L, 2L, 5L, 12L), baseball = c(6L, 32L, 23L, 15L, 2L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_integer", "collector")), hello = structure(list(), class = c("collector_integer", "collector")), world = structure(list(), class = c("collector_integer", "collector")), football = structure(list(), class = c("collector_integer", "collector")), baseball = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))  df1 %>%   full_join(df2, by = intersect(colnames(df1), colnames(df2))) %>%   group_by(ID) %>%   summarize_all(na.omit) #> # A tibble: 5 x 7 #>      ID hello world hockey soccer football baseball #>   <int> <int> <int>  <int>  <int>    <int>    <int> #> 1     1     2     3      7      4       43        6 #> 2     2     5     1      2      5       24       32 #> 3     3    10     8      8     23        2       23 #> 4     4     4    17      5     12        5       15 #> 5     5     9     7      3     43       12        2 Created on 2018-07-13 by the reprex package (v0.2.0).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With