Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient way to double aggregation for a data frame

I have a questions regarding to aggregation a data frame double times, and involves re-formate the table.

I have a table, contains two columns: name, and category. The category is factor variable, contains 10 levels, say '0' to '9'. So the data frame looks like:

name   category
a        0
a        1
a        1
a        4
a        9
b        2
b        2
b        2
b        3
b        7
b        8
c        0
c        0
c        0

The result I want to aggregate looks like this:

name category.0  category.1  category.2 category.3 category.4 ..... category.9
a        1           2            0         0           1               1
b        0           0            3         1           0               0            
c        3           0            0         0           0               0

it counts how many '0','1',...,'9' for each unique name.

What I did to generate the result, is to use a simple aggregate function

new_df <- aggregate(category ~ name,df, FUN=summary)

and then unlist the second columns of new_df to get the result.

However, it is too slow. I would like to know if there is more efficient way to do this.

like image 445
zxwjames Avatar asked Mar 06 '26 22:03

zxwjames


1 Answers

You can use dcast from package reshape2:

library(reshape2)

x = dcast(df, name~category)
setNames(x, c(names(x)[1], paste0('category',names(x)[-1])))

#  name category0 category1 category2 category3 category4 category7 category8 category9
#1    a         1         2         0         0         1         0         0         1
#2    b         0         0         3         1         0         1         1         0
#3    c         3         0         0         0         0         0         0         0
like image 125
Colonel Beauvel Avatar answered Mar 09 '26 10:03

Colonel Beauvel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!