Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting a range of rows from R data frame

Tags:

r

I have a data frame with 1000 rows and I want to perform some operation on it with 100 rows at a time. So, I am trying to find out how would I use a counter increment on the number of rows and select 100 rows at a time like 1 to 100, then 101 to 200... uptil 1000 and perform operation on each subset using a for loop. Can anyone please suggest what how can this be done as I could not find out a good method.

like image 410
Kunal Batra Avatar asked Feb 19 '23 12:02

Kunal Batra


1 Answers

An easy way would be to create a grouping variable, then use split() and lapply() to do whatever operations you need to.

Your grouping can be easily created using rep().

Here is an example:

set.seed(1)
demo = data.frame(A = sample(300, 50, replace=TRUE),
                  B = rnorm(50))
demo$groups = rep(1:5, each=10)
demo.split = split(demo, demo$groups)
lapply(demo.split, colMeans)
# $`1`
#           A           B      groups 
# 165.9000000  -0.1530186   1.0000000 
# 
# $`2`
#           A           B      groups 
# 168.2000000   0.1141589   2.0000000 
# 
# $`3`
#           A           B      groups 
# 126.0000000   0.1625241   3.0000000 
# 
# $`4`
#           A           B      groups 
# 159.4000000   0.3340555   4.0000000 
# 
# $`5`
#           A           B      groups 
# 181.8000000   0.0363812   5.0000000 

If you prefer to not add the groups to your source data.frame, you can achieve the same effect by doing the following:

groups = rep(1:5, each=10)
lapply(split(demo, groups), colMeans)

Of course, replace colMeans with whatever function you want.

Using your example of a data.frame with 1000 rows, your rep() statement should be something like:

rep(1:10, each=100)
like image 58
A5C1D2H2I1M1N2O1R2T1 Avatar answered Mar 02 '23 23:03

A5C1D2H2I1M1N2O1R2T1