Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table group by multiple columns into 1 column and sum

I have the following data.table:

> dt = data.table(sales_ccy = c("USD", "EUR", "GBP", "USD"), sales_amt = c(500,600,700,800), cost_ccy = c("GBP","USD","GBP","USD"), cost_amt = c(-100,-200,-300,-400))
> dt
   sales_ccy sales_amt cost_ccy cost_amt
1:       USD       500      GBP     -100
2:       EUR       600      USD     -200
3:       GBP       700      GBP     -300
4:       USD       800      USD     -400

My aim is to get the following data.table:

> dt
   ccy total_amt
1: EUR       600
2: GBP       300
3: USD       700

Basically, I want to sum all costs and sales together by currency. In reality, this data.table has >500,000 rows so I would want a fast and efficient way to sum the amounts together.

Any idea of a fast way to do this?

like image 445
chengcj Avatar asked Apr 24 '15 10:04

chengcj


1 Answers

Using data.table v1.9.6+ which has improved version of melt which can melt in to multiple columns simultaneously,

require(data.table) # v1.9.6+
melt(dt, measure = patterns("_ccy$", "_amt$")
    )[, .(tot_amt = sum(value2)), keyby = .(ccy=value1)]
like image 56
Arun Avatar answered Nov 15 '22 09:11

Arun