Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table WHERE before BY

Tags:

r

data.table

I have the following problem, which has probably a pretty simple solution: When I use

library (data.table)
actions = data.table(User_id = c("Carl","Carl","Carl","Lisa","Moe"),
                     category = c(1,1,2,2,1),
                     value= c(10,20,30,40,50))

   User_id category value
1:    Carl        1    10
2:    Carl        1    20
3:    Carl        2    30
4:    Lisa        2    40
5:     Moe        1    50

actions[category==1,sum(value),by= User_id]

The problem is, that apparently it first sorts out the rows where category is 1 and then uses the by command. So what I get is:

   User_id V1
1:    Carl 30
2:     Moe 50

But what I want is:

   User_id V1
1:    Carl 30
2:    Lisa 0
3:     Moe 50

I am building a data.table just containing information about the users, so:

users = actions[,User_id,by= User_id]
users$value_one = actions[category==1,.(value_one =sum(value)),by= User_id]$value_one

which throws errors or includes wrong values, when there are some users that have no entry.

like image 605
Marvins.seins Avatar asked May 18 '16 15:05

Marvins.seins


People also ask

How do you read a data table?

A table can be read from left to right or from top to bottom. If you read a table across the row, you read the information from left to right. In the Cats and Dogs Table, the number of black animals is 2 + 2 = 4. You'll see that those are the numbers in the row directly to the right of the word 'Black.

What goes in the first column of a data table?

The leftmost column should be reserved for your independent variable. For example, if you're researching how much rain fell in the past year, your independent variable would be the months of the year. Thus, your leftmost column would be labeled "Month" and the next column would be labeled "Rainfall."

How do you use a data table?

On the Data tab, in the Data Tools group or Forecast group (in Excel 2016), click What-If Analysis > Data Table (in the Data Tools group or Forecast group of Excel 2016). In the Row input cell field, enter the reference to the input cell for the input values in the row. Type cell B4 in the Row input cell box.

Is data table DT == true?

data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.


1 Answers

This is almost as succinct, and gets the job done.

actions[, .SD[category==1, sum(value)], by=User_id]
#    User_id V1
# 1:    Carl 30
# 2:    Lisa  0
# 3:     Moe 50

## Or, better yet, no need to muck around with .SD, (h.t. David Arenburg)
actions[, sum(value[category == 1]), by = User_id]
#    User_id V1
# 1:    Carl 30
# 2:    Lisa  0
# 3:     Moe 50

If the relative inefficiency of the above is a problem in your use case, here's a more efficient alternative:

res <- actions[, .(val=0), by=User_id]
res[actions[category==1, .(val=sum(value)), by=User_id], val:=i.val, on="User_id"]    
res
#    User_id val
# 1:    Carl  30
# 2:    Lisa   0
# 3:     Moe  50
like image 121
Josh O'Brien Avatar answered Sep 18 '22 13:09

Josh O'Brien