Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get every nth element from each group in a grouped data frame

I have a big data frame with a column, with a group name, which is grouped with dplyr. So multiple rows have the same group name. To reduce the data, I would like to extract every nth element starting from the first element from each group. Is there any R way without loops?

Subsetting every row with a sequence, has the problem that often the first row of each group is missed. e.g.

data[seq(1, nrow(data), 10), ] # Some groups start without the first row.

Input:

   Val Group
1  1.0 Fruit
2  2.0 Fruit
3  3.0 Fruit
4  1.5 Veg
5  2.8 Veg
6  4.2 Veg
7  5.1 Veg

Output (every second element, be aware of 3rd row!):

   Val Group
1  1.0 Fruit
2  3.0 Fruit
**3  1.5 Veg**
4  4.2 Veg
like image 572
Saren Tasciyan Avatar asked Jul 16 '19 23:07

Saren Tasciyan


4 Answers

library(dplyr)
data %>% group_by(Group) %>%
  slice(seq(1, n(), by = 2))

This gives:

# A tibble: 4 x 2
# Groups:   Group [2]
    Val Group
  <dbl> <fct>
1   1   Fruit
2   3   Fruit
3   1.5 Veg  
4   4.2 Veg 
like image 83
sumshyftw Avatar answered Oct 31 '22 17:10

sumshyftw


Here's a base R way:

DF$ID_by_Group <- ave(DF$Val, DF$Group, FUN =  seq_along)
DF[DF$ID_by_Group %in% seq(1,3, by = 2), ]

  Val Group ID_by_Group
1 1.0 Fruit           1
3 3.0 Fruit           3
4 1.5   Veg           1
6 4.2   Veg           3

The ave() function creates an ID by group. The second statement is just filtering based on the ID_by_Group that we created.

Note, we could do it all at once and/or remove the added column:

DF[ave(DF$Val, DF$Group, FUN =  seq_along) %in% seq(1,3, by = 2), ]

DF$ID_by_Group <- ave(DF$Val, DF$Group, FUN =  seq_along)

DF[DF$ID_by_Group %in% seq(1,3, by = 2), -3]
DF[DF$ID_by_Group %in% seq(1,3, by = 2), -grep('ID_by_Group', names(DF))]
DF[DF$ID_by_Group %in% seq(1,3, by = 2), c('Val', 'Group')]

#all provide:

  Val Group
1 1.0 Fruit
3 3.0 Fruit
4 1.5   Veg
6 4.2   Veg
like image 44
Cole Avatar answered Oct 31 '22 18:10

Cole


An alternative choice is data.table:

> setDT(data)                   
> data[(rowid(Group) %% 2) == 1]
   Val Group                    
1: 1.0 Fruit                    
2: 3.0 Fruit                    
3: 1.5   Veg                    
4: 4.2   Veg                    
like image 3
mt1022 Avatar answered Oct 31 '22 19:10

mt1022


Another base R option to select every nth row in each group using ave exploting the recycling property

n <- 2

df[as.logical(with(df, ave(Val, Group, FUN = function(x) 
                      c(TRUE, rep(FALSE, n - 1))))),  ]

#  Val Group
#1 1.0 Fruit
#3 3.0 Fruit
#4 1.5   Veg
#6 4.2   Veg

This returns a warning message since the vector returned is not of the same length but I think it can be ignored.

Or using @thelatemail's idea from the comment which doesn't give the warning message.

df[as.logical(with(df, ave(Val, Group, FUN = function(x) 
                   seq_along(x) %% 2 == 1))), ]
like image 1
Ronak Shah Avatar answered Oct 31 '22 19:10

Ronak Shah