I'm using the data.table
package to speed up some summary statistic collection on a data set.
I'm curious if there's a way to group by more than one column. My data looks like this:
purchaseAmt adShown url
15.54 00001 150000001
4.82 00002 150000001
157.99 05005 776300044
... ... ...
I can do something like this:
adShownMedian <- df1[,median(purchaseAmt),by="adShown"]
to get each ad's median. How would I do something that combines adShown
and url
?
I've tried this:
adShownMedian <- df1[,median(purchaseAmt),by=c("adShown","url")]
but no luck.
Any suggestions?
A shortcut way to group rows or columns is to highlight the rows/columns you wish to group and use ALT+SHIFT+RIGHT ARROW to group the rows/columns, and ALT+SHIFT+LEFT ARROW to ungroup them. You can go multiple levels as well (so you could group rows 1-30, and then group rows 20-25 as a subgroup of the first).
Select the data (including any summary rows or columns). On the Data tab, in the Outline group, click Group > Group Rows or Group Columns. Optionally, if you want to outline an inner, nested group — select the rows or columns within the outlined data range, and repeat step 3.
table's . N symbol, where . N stands for “number of rows.” It can be the total number of rows, or number of rows per group if you're aggregating in the “by” section. This expression returns the total number of rows in the data.table: mydt[, . N]
Use by=list(adShown,url)
instead of by=c("adShown","url")
Example:
set.seed(007) DF <- data.frame(X=1:20, Y=sample(c(0,1), 20, TRUE), Z=sample(0:5, 20, TRUE)) library(data.table) DT <- data.table(DF) DT[, Mean:=mean(X), by=list(Y, Z)] X Y Z Mean 1: 1 1 3 1.000000 2: 2 0 1 9.333333 3: 3 0 5 7.400000 4: 4 0 5 7.400000 5: 5 0 5 7.400000 6: 6 1 0 6.000000 7: 7 0 3 7.000000 8: 8 1 2 12.500000 9: 9 0 5 7.400000 10: 10 0 2 15.000000 11: 11 0 4 14.500000 12: 12 0 1 9.333333 13: 13 1 1 13.000000 14: 14 0 1 9.333333 15: 15 0 2 15.000000 16: 16 0 5 7.400000 17: 17 1 2 12.500000 18: 18 0 4 14.500000 19: 19 1 5 19.000000 20: 20 0 2 15.000000
To add on Jilber Urbina answer, and address kahlo comment:
if you want to get a single row for each Y - Z combination with the aggregated values you can do
DT[, .(X=mean(X)), by=list(Y, Z)]
that is the same as doing
DT[, .(X=mean(X)), by=.(Y, Z)]
# or
DT[, .(X=mean(X)), by=c('Y','Z')]
# or specify column names in vector
names = c('Y','Z')
DT[, .(X=mean(X)), by=names]
(data.table version 1.12.6)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With