I have a big data frame with a column, with a group name, which is grouped with dplyr. So multiple rows have the same group name. To reduce the data, I would like to extract every nth element starting from the first element from each group. Is there any R way without loops?
Subsetting every row with a sequence, has the problem that often the first row of each group is missed. e.g.
data[seq(1, nrow(data), 10), ] # Some groups start without the first row.
Input:
Val Group
1 1.0 Fruit
2 2.0 Fruit
3 3.0 Fruit
4 1.5 Veg
5 2.8 Veg
6 4.2 Veg
7 5.1 Veg
Output (every second element, be aware of 3rd row!):
Val Group
1 1.0 Fruit
2 3.0 Fruit
**3 1.5 Veg**
4 4.2 Veg
library(dplyr)
data %>% group_by(Group) %>%
slice(seq(1, n(), by = 2))
This gives:
# A tibble: 4 x 2
# Groups: Group [2]
Val Group
<dbl> <fct>
1 1 Fruit
2 3 Fruit
3 1.5 Veg
4 4.2 Veg
Here's a base R way:
DF$ID_by_Group <- ave(DF$Val, DF$Group, FUN = seq_along)
DF[DF$ID_by_Group %in% seq(1,3, by = 2), ]
Val Group ID_by_Group
1 1.0 Fruit 1
3 3.0 Fruit 3
4 1.5 Veg 1
6 4.2 Veg 3
The ave()
function creates an ID by group. The second statement is just filtering based on the ID_by_Group
that we created.
Note, we could do it all at once and/or remove the added column:
DF[ave(DF$Val, DF$Group, FUN = seq_along) %in% seq(1,3, by = 2), ]
DF$ID_by_Group <- ave(DF$Val, DF$Group, FUN = seq_along)
DF[DF$ID_by_Group %in% seq(1,3, by = 2), -3]
DF[DF$ID_by_Group %in% seq(1,3, by = 2), -grep('ID_by_Group', names(DF))]
DF[DF$ID_by_Group %in% seq(1,3, by = 2), c('Val', 'Group')]
#all provide:
Val Group
1 1.0 Fruit
3 3.0 Fruit
4 1.5 Veg
6 4.2 Veg
An alternative choice is data.table
:
> setDT(data)
> data[(rowid(Group) %% 2) == 1]
Val Group
1: 1.0 Fruit
2: 3.0 Fruit
3: 1.5 Veg
4: 4.2 Veg
Another base R option to select every nth row in each group using ave
exploting the recycling property
n <- 2
df[as.logical(with(df, ave(Val, Group, FUN = function(x)
c(TRUE, rep(FALSE, n - 1))))), ]
# Val Group
#1 1.0 Fruit
#3 3.0 Fruit
#4 1.5 Veg
#6 4.2 Veg
This returns a warning message since the vector returned is not of the same length but I think it can be ignored.
Or using @thelatemail's idea from the comment which doesn't give the warning message.
df[as.logical(with(df, ave(Val, Group, FUN = function(x)
seq_along(x) %% 2 == 1))), ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With