I have a questions regarding to aggregation a data frame double times, and involves re-formate the table.
I have a table, contains two columns: name, and category. The category is factor variable, contains 10 levels, say '0' to '9'. So the data frame looks like:
name category
a 0
a 1
a 1
a 4
a 9
b 2
b 2
b 2
b 3
b 7
b 8
c 0
c 0
c 0
The result I want to aggregate looks like this:
name category.0 category.1 category.2 category.3 category.4 ..... category.9
a 1 2 0 0 1 1
b 0 0 3 1 0 0
c 3 0 0 0 0 0
it counts how many '0','1',...,'9' for each unique name.
What I did to generate the result, is to use a simple aggregate function
new_df <- aggregate(category ~ name,df, FUN=summary)
and then unlist the second columns of new_df to get the result.
However, it is too slow. I would like to know if there is more efficient way to do this.
You can use dcast from package reshape2:
library(reshape2)
x = dcast(df, name~category)
setNames(x, c(names(x)[1], paste0('category',names(x)[-1])))
# name category0 category1 category2 category3 category4 category7 category8 category9
#1 a 1 2 0 0 1 0 0 1
#2 b 0 0 3 1 0 1 1 0
#3 c 3 0 0 0 0 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With