Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass different arguments to each group in grouping of data.table?

Tags:

r

data.table

Example:

Here is a data table called dt:

> library(data.table)
> dt <- data.table(colA=rep(letters[1:3],each=3), colB=0:8)
> dt
   colA colB
1:    a    0
2:    a    1
3:    a    2
4:    b    3
5:    b    4
6:    b    5
7:    c    6
8:    c    7
9:    c    8

I want to know:

For colA equals "a", is there any values in colB > 2?

For colA equals "b", is there any values in colB > 3?

For colA equals "c", is there any values in colB > 4?

I create a vector called arg to hold arguments for group "a", "b" & "c":

arg <- c(2,3,4)

Could anyone give me a simple way to pass arg to grouping of dt by colA?

Here is my desired result:

     colA    V1
  1:    a FALSE
  2:    b  TRUE
  3:    c  TRUE

This is my first question here and I tried to make it simple. Thank you in advance.

like image 736
user3724375 Avatar asked Jun 10 '14 04:06

user3724375


People also ask

What are the three phases of the Pandas GroupBy() function?

(1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.

How to group columns Pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .

How do you use an aggregate function in R?

In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .


1 Answers

For each subgroup that it operates on, [.data.table() stores information about the current value(s) of the grouping variable(s) in a variable named .BY.

If you first set up a named vector that maps the grouping variable's levels to the desired parameter values, you can use .BY to index into it, extracting the appropriate values, like so:

arg <- setNames(c(2, 3, 4), c("a", "b", "c"))
arg
# a b c 
# 2 3 4

dt[, any(colB > arg[unlist(.BY)]), by="colA"]
#    colA    V1
# 1:    a FALSE
# 2:    b  TRUE
# 3:    c  TRUE
like image 195
Josh O'Brien Avatar answered Oct 13 '22 00:10

Josh O'Brien