Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL's `case when ...` code conversion using data.table package in R

Tags:

sql

r

data.table

I am attempting to convert SQL code to R code. However, the data is around 35 million records with 200 columns each. So the best choice I could find is data.table package.

Here is the problem. In the SQL code I am able to perform an operation such as this,

select order_date,sum(case when item in ("D","C","B") then col4 end)as col1
sum(case when item not in ("Z","X","Y") then col4 end) as col2 
from datatable 
where col3 <25
group by order_date;

What the above query allows me to group by each date. I am unable to duplicate it in data.table. My attempts are as follows.

grp1<- c("D","C","B")
grp2<- c("Z","X","Y")
d1 <- dat[item %in% grp1,.(col1 = sum(col4,na.rm = TRUE),by = Order_Date]
d2 <- dat[item %in% grp2,.(col2 = sum(col4,na.rm = TRUE),by = Order_Date]
d3 <- data.table(d1,d2)

Now, since it subsets initially my grouping is different in both d1 and d2

like image 511
Shoaibkhanz Avatar asked Aug 08 '15 17:08

Shoaibkhanz


1 Answers

You can try the following:

DT[col3 < 25,
   .(col1 = sum(col4[item %in% c("D","C","B")]),
     col2 = sum(col4[!item %in% c("Z","X","Y")])),
   by = .(order_date)]
like image 193
jangorecki Avatar answered Nov 14 '22 03:11

jangorecki