Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter rows within data.table group if max group value > some value [duplicate]

Tags:

r

data.table

I am trying to filter all rows within a group in a data.table if a max value within that group is > some value. Below is how I would do it in DPLY and how I got it working in two steps in data.table.

#DPLYR 
df<-data.table(
  x =1:12
  ,y = 1:3
)

df %>% group_by(y) %>% 
  filter(max(x) < 11)

##data.table
df[,max_value :=max(x),by=y][max_value<11]

The output should be

    x y
1:  1 1 
2:  4 1 
3:  7 1 
4: 10 1

Is there a way to do this in one step without creating the column in my dataset? All that I have been able to find are subsetting a group to get one specific value within a group, not return all row of the group that meet the condition.

like image 713
Andrew Troiano Avatar asked Aug 28 '19 18:08

Andrew Troiano


People also ask

How to fetch rows with the maximum value in a column?

Another way to fetch rows with the maximum value in a given column is to use a common table expression with window function: WITH cte AS ( SELECT Contestant, Game, Score, RANK () OVER ( PARTITION BY Contestant ORDER BY Score DESC ) AS r FROM Gameshow ) SELECT Contestant, Game, Score FROM cte WHERE r = 1 ORDER BY Contestant ASC;

How do I find the maximum and lowest values of each group?

Now, we can use the group_by and the top_n functions to find the highest and lowest numeric values of each group: The RStudio console is showing the result of the previous R syntax: The maxima of A, B, and C are 2, 5, and 10, respectively. UPDATE: Note that top_n has been superseded in favor of slice_min ()/slice_max ().

How do you find the Max of a group in R?

Now, we can use the group_by and the top_n functions to find the highest and lowest numeric values of each group: The RStudio console is showing the result of the previous R syntax: The maxima of A, B, and C are 2, 5, and 10, respectively.

What is the variable group in the example data frame?

Our example data is a data frame with ten rows and two columns. The variable x is numeric and contains values ranging from 1 to 10. The variable group is our grouping indicator and contains three different group values (i.e.


Video Answer


1 Answers

We can use .I to get the row index, extract the index column and subset

df[df[, .I[max(x) < 11], y]$V1]
#    x y
#1:  1 1
#2:  4 1
#3:  7 1
#4: 10 1

Or another option is .SD

df[, .SD[max(x) < 11], y]
like image 71
akrun Avatar answered Oct 19 '22 00:10

akrun