Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you use a data.frame twice in a dplyr chain? dplyr says " Error: cannot handle "

Tags:

r

dplyr

I am trying to use a data.frame twice in a dplyr chain. Here is a simple example that gives an error

df <- data.frame(Value=1:10,Type=rep(c("A","B"),5))

df %>% 
  group_by(Type) %>% 
  summarize(X=n())  %>% 
  mutate(df %>%filter(Value>2) %>%  
  group_by(Type) %>%  
  summarize(Y=sum(Value)))

Error: cannot handle

So the idea is that first a data.frame is created with two columns Value which is just some data and Type which indicates which group the value is from.

I then try to use summarize to get the number of objects in each group, and then mutate, using the object again to get the sum of the values, after the data has been filtered. However I get the Error: cannot handle. Any ideas what is happening here?

Desired Output:

Type X Y
  A  5 24
  B  5 28
like image 359
John Paul Avatar asked Jan 29 '26 22:01

John Paul


1 Answers

You could try the following

df %>% 
  group_by(Type) %>% 
  summarise(X = n(), Y = sum(Value[Value > 2]))

# Source: local data frame [2 x 3]
# 
#   Type X  Y
# 1    A 5 24
# 2    B 5 28

The idea is to filter only Value by the desired condition, instead the whole data set


And a bonus solution

library(data.table)
setDT(df)[, .(X = .N, Y = sum(Value[Value > 2])), by = Type]
#    Type X  Y
# 1:    A 5 24
# 2:    B 5 28

Was going to suggest that to @nongkrong but he deleted, with base R we could also do

aggregate(Value ~ Type, df, function(x) c(length(x), sum(x[x>2])))
#   Type Value.1 Value.2
# 1    A       5      24
# 2    B       5      28
like image 143
David Arenburg Avatar answered Jan 31 '26 13:01

David Arenburg