Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge unequal dataframes and replace missing rows with 0

Tags:

merge

dataframe

r

I have two data.frames, one with only characters and the other one with characters and values.

df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e')) df2 = data.frame(x=c('a', 'b', 'c'),y = c(0,1,0)) merge(df1, df2)   x y 1 a 0 2 b 1 3 c 0  

I want to merge df1 and df2. The characters a, b and c merged good and also have 0, 1, 0 but d and e has nothing. I want d and e also in the merge table, with the 0 0 condition. Thus for every missing row at the df2 data.frame, the 0 must be placed in the df1 table, like:

  x y 1 a 0 2 b 1 3 c 0 4 d 0 5 e 0 
like image 205
Lisann Avatar asked May 11 '11 14:05

Lisann


People also ask

How do you combine uneven Dataframes?

Therefore, to merge these types of data frames we can merge them with all of their values and convert the missing values to zero if necessary. This can be done by using merge function and replacement of missing values NA to zeros should be done by using single square brackets.

How do I merge two Dataframes with different number of rows?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

How do I merge two Dataframes in R?

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens.


1 Answers

Take a look at the help page for merge. The all parameter lets you specify different types of merges. Here we want to set all = TRUE. This will make merge return NA for the values that don't match, which we can update to 0 with is.na():

zz <- merge(df1, df2, all = TRUE) zz[is.na(zz)] <- 0  > zz   x y 1 a 0 2 b 1 3 c 0 4 d 0 5 e 0 

Updated many years later to address follow up question

You need to identify the variable names in the second data table that you aren't merging on - I use setdiff() for this. Check out the following:

df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e', NA)) df2 = data.frame(x=c('a', 'b', 'c'),y1 = c(0,1,0), y2 = c(0,1,0))  #merge as before df3 <- merge(df1, df2, all = TRUE) #columns in df2 not in df1 unique_df2_names <- setdiff(names(df2), names(df1)) df3[unique_df2_names][is.na(df3[, unique_df2_names])] <- 0  

Created on 2019-01-03 by the reprex package (v0.2.1)

like image 70
Chase Avatar answered Oct 10 '22 10:10

Chase