Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting the first nth rows by group with number of rows varied

Tags:

dataframe

r

I like to select the first (2,3,0,4) rows of each group in a data frame.

> f<-data.frame(group=c(1,1,1,2,2,3,4),y=c(1:7))
> 
>   group y
>      1 1
>      1 2
>      1 3
>      2 4
>      2 5
>      3 6
>      4 7

and obtain a data frame as follows

group y
1 1
1 2
2 4
2 5
4 7

I tried to use by and head but head does not take a vector.

Thank you for your help.

like image 399
Tony Avatar asked Feb 25 '23 21:02

Tony


2 Answers

With the more traditional lapply:

k <- c(2,3,0,4)
fs <- split(f, f$group)
do.call(rbind,lapply(seq_along(k), function(i) head(fs[[i]], k[i])))

result is:

  group y
1     1 1
2     1 2
4     2 4
5     2 5
7     4 7
like image 148
Aaron left Stack Overflow Avatar answered Feb 27 '23 16:02

Aaron left Stack Overflow


Using plyr:

library(plyr)
rows <- c(2,3,0,4)
ddply(f,.(group),function(x)head(x,rows[x[1,1]]))
        group y
    1     1 1
    2     1 2
    3     2 4
    4     2 5
    5     4 7

edit:

misunderstood the question so updated answer

like image 37
Sacha Epskamp Avatar answered Feb 27 '23 17:02

Sacha Epskamp