I have the following problem, which has probably a pretty simple solution: When I use
library (data.table)
actions = data.table(User_id = c("Carl","Carl","Carl","Lisa","Moe"),
category = c(1,1,2,2,1),
value= c(10,20,30,40,50))
User_id category value
1: Carl 1 10
2: Carl 1 20
3: Carl 2 30
4: Lisa 2 40
5: Moe 1 50
actions[category==1,sum(value),by= User_id]
The problem is, that apparently it first sorts out the rows where category is 1 and then uses the by command. So what I get is:
User_id V1
1: Carl 30
2: Moe 50
But what I want is:
User_id V1
1: Carl 30
2: Lisa 0
3: Moe 50
I am building a data.table just containing information about the users, so:
users = actions[,User_id,by= User_id]
users$value_one = actions[category==1,.(value_one =sum(value)),by= User_id]$value_one
which throws errors or includes wrong values, when there are some users that have no entry.
A table can be read from left to right or from top to bottom. If you read a table across the row, you read the information from left to right. In the Cats and Dogs Table, the number of black animals is 2 + 2 = 4. You'll see that those are the numbers in the row directly to the right of the word 'Black.
The leftmost column should be reserved for your independent variable. For example, if you're researching how much rain fell in the past year, your independent variable would be the months of the year. Thus, your leftmost column would be labeled "Month" and the next column would be labeled "Rainfall."
On the Data tab, in the Data Tools group or Forecast group (in Excel 2016), click What-If Analysis > Data Table (in the Data Tools group or Forecast group of Excel 2016). In the Row input cell field, enter the reference to the input cell for the input values in the row. Type cell B4 in the Row input cell box.
data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.
This is almost as succinct, and gets the job done.
actions[, .SD[category==1, sum(value)], by=User_id]
# User_id V1
# 1: Carl 30
# 2: Lisa 0
# 3: Moe 50
## Or, better yet, no need to muck around with .SD, (h.t. David Arenburg)
actions[, sum(value[category == 1]), by = User_id]
# User_id V1
# 1: Carl 30
# 2: Lisa 0
# 3: Moe 50
If the relative inefficiency of the above is a problem in your use case, here's a more efficient alternative:
res <- actions[, .(val=0), by=User_id]
res[actions[category==1, .(val=sum(value)), by=User_id], val:=i.val, on="User_id"]
res
# User_id val
# 1: Carl 30
# 2: Lisa 0
# 3: Moe 50
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With