Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr group_by and summarize for two df's with same column name

Tags:

dataframe

r

dplyr

suppose you have the following two data.frames:

set.seed(1)
x <- letters[1:10]
df1 <- data.frame(x)
z <- rnorm(20,100,10)
df2 <- data.frame(x,z)

(note that both dfs have a column named "x")

and you want to summarize the sums of df2$z for the groups of "x" in df1 like this:

df1 %.%
  group_by(x) %.%
  summarize(
    z = sum(df2$z[df2$x == x]) 
   )

this returns an error "invalid indextype integer" (translated).

But when I change the name of column "x" in any one of the two dfs, it works:

df2 <- data.frame(x1 = x,z) #column is now named "x1", it would also work if the name was changed in df1

df1 %.%
   group_by(x) %.%
   summarize(
     z = sum(df2$z[df2$x1 == x]) 
   )

#   x        z
#1  a 208.8533
#2  b 205.7349
#3  c 185.4313
#4  d 193.8058
#5  e 214.5444
#6  f 191.3460
#7  g 204.7124
#8  h 216.8216
#9  i 213.9700
#10 j 202.8851

I can imagine many situations, where you have two dfs with the same column name (like an "ID" column) for which this might be a problem, unless there is a simple way around it.

Did I miss something? There may be other ways to get to the same result for this example but I'm interested in understanding if this is possible in dplyr (or perhaps why not).

(the two dfs dont necessarily need to have the same unique "x" values as in this example)

like image 368
talat Avatar asked Oct 21 '22 09:10

talat


2 Answers

Following the comment from @beginneR, I'm guessing it'd be something like:

inner_join(df1, df2) %.% group_by(x) %.% summarise(z=sum(z))

Joining by: "x"
Source: local data frame [10 x 2]

   x        z
1  a 208.8533
2  b 205.7349
3  c 185.4313
4  d 193.8058
5  e 214.5444
6  f 191.3460
7  g 204.7124
8  h 216.8216
9  i 213.9700
10 j 202.8851
like image 107
Arun Avatar answered Oct 23 '22 02:10

Arun


you can try:

df2%.%filter(x%in%df1$x)%.%group_by(x)%.%summarise(sum(z))

hth

like image 36
droopy Avatar answered Oct 23 '22 01:10

droopy