I'm using the <code>data.table</code> package to speed up some summary statistic collection on a data set. I'm curious if there's a way to group by more than one column. My data looks like this: <pre class="prettyprint"><code> purchaseAmt adShown url 15.54 00001 150000001 4.82 00002 150000001 157.99 05005 776300044 ... ... ... </code></pre> I can do something like this: <pre class="prettyprint"><code>adShownMedian <- df1[,median(purchaseAmt),by="adShown"] </code></pre> to get each ad's median. How would I do something that combines <code>adShown</code> and <code>url</code>? I've tried this: <pre class="prettyprint"><code>adShownMedian <- df1[,median(purchaseAmt),by=c("adShown","url")] </code></pre> but no luck. Any suggestions?

To add on Jilber Urbina answer, and address kahlo comment: if you want to get a single row for each Y - Z combination with the aggregated values you can do <pre class="prettyprint"><code>DT[, .(X=mean(X)), by=list(Y, Z)] </code></pre> that is the same as doing <pre class="prettyprint"><code>DT[, .(X=mean(X)), by=.(Y, Z)] # or DT[, .(X=mean(X)), by=c('Y','Z')] # or specify column names in vector names = c('Y','Z') DT[, .(X=mean(X)), by=names] </code></pre> (data.table version 1.12.6)

How to group data.table by multiple columns?

Tags:

r

group-by

data.table

I'm using the data.table package to speed up some summary statistic collection on a data set.

I'm curious if there's a way to group by more than one column. My data looks like this:

  purchaseAmt        adShown        url
   15.54            00001         150000001
    4.82            00002         150000001
  157.99            05005         776300044
   ...               ...            ...

I can do something like this:

adShownMedian <- df1[,median(purchaseAmt),by="adShown"]

to get each ad's median. How would I do something that combines adShown and url?

I've tried this:

adShownMedian <- df1[,median(purchaseAmt),by=c("adShown","url")]

but no luck.

Any suggestions?

323

asked Sep 18 '12 14:09

screechOwl

Video Answer

2 Answers

Use by=list(adShown,url) instead of by=c("adShown","url")

Example:

set.seed(007)  DF <- data.frame(X=1:20, Y=sample(c(0,1), 20, TRUE), Z=sample(0:5, 20, TRUE))  library(data.table) DT <- data.table(DF) DT[, Mean:=mean(X), by=list(Y, Z)]        X Y Z      Mean  1:  1 1 3  1.000000  2:  2 0 1  9.333333  3:  3 0 5  7.400000  4:  4 0 5  7.400000  5:  5 0 5  7.400000  6:  6 1 0  6.000000  7:  7 0 3  7.000000  8:  8 1 2 12.500000  9:  9 0 5  7.400000 10: 10 0 2 15.000000 11: 11 0 4 14.500000 12: 12 0 1  9.333333 13: 13 1 1 13.000000 14: 14 0 1  9.333333 15: 15 0 2 15.000000 16: 16 0 5  7.400000 17: 17 1 2 12.500000 18: 18 0 4 14.500000 19: 19 1 5 19.000000 20: 20 0 2 15.000000

answered Sep 27 '22 19:09

Jilber Urbina

To add on Jilber Urbina answer, and address kahlo comment:
if you want to get a single row for each Y - Z combination with the aggregated values you can do

DT[, .(X=mean(X)), by=list(Y, Z)]

that is the same as doing

DT[, .(X=mean(X)), by=.(Y, Z)] 
# or
DT[, .(X=mean(X)), by=c('Y','Z')]
# or specify column names in vector
names = c('Y','Z')
DT[, .(X=mean(X)), by=names]

(data.table version 1.12.6)

answered Sep 27 '22 18:09

Fernando.-

Related questions
                            
                                dplyr filter: Get rows with minimum of variable, but only the first if multiple minima
                            
                                How to save data file into .RData?
                            
                                Any way to make plot points in scatterplot more transparent in R?
                            
                                Split string based on alternating character in R
                            
                                Windows 7, update.packages problem: "unable to move temporary installation"?
                            
                                Combine base and ggplot graphics in R figure window
                            
                                Add multiple columns to R data.table in one function call?
                            
                                How to leave the R browser() mode in the console window?
                            
                                R: 2 functions with the same name in 2 different packages
                            
                                How can I print when using %dopar%
                            
                                How to declare a vector of zeros in R
                            
                                Merge two data frames while keeping the original row order
                            
                                Understanding `scale` in R
                            
                                Using ggplot2, can I insert a break in the axis?
                            
                                Round up from .5
                            
                                Multiply rows of matrix by vector?
                            
                                Keeping trailing zeros
                            
                                Append data frames together in a for loop
                            
                                R: losing column names when adding rows to an empty data frame
                            
                                How to tell CRAN to install package dependencies automatically?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With