Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elegant way to loop over chunks with remainder in r?

Tags:

r

I'm looking for some way to iterate over chunks in R, but right now I'm having to add an additional statement at the end to capture the remainder if the number of items does not divide evenly into chunksize. For example:

for (i in 1:(nrow(dataframe)/chunksize)){
  (do something with chunk)
}

remainder <- nrow(dataframe) %% chunksize
(do something with dataframe[(length(dataframe)-remainder):length(dataframe),])

Is there a more elegant way to do this? I'm assuming this type of operation is done very often in other code.

like image 671
Allen Wang Avatar asked Oct 05 '15 15:10

Allen Wang


2 Answers

If you rly want to keep the for construct:

chunk_size <- 7
for (i in seq(1, nrow(mtcars), chunk_size)) {

  seq_size <- chunk_size
  if ((i + seq_size) > nrow(mtcars)) seq_size <- nrow(mtcars) - i + 1

  cat(i, seq_size, "\n")

}

1 7 
8 7 
15 7 
22 7 
29 4 

You can use that to work on the indices you need to.

Here's one w/o the if:

chunk_size <- 7
chunks <- ggplot2::cut_interval(1:nrow(mtcars), length=chunk_size, labels=FALSE)
for (i in unique(chunks)) {
  print(nrow(mtcars[which(chunks==i),]))
}
like image 194
hrbrmstr Avatar answered Nov 05 '22 04:11

hrbrmstr


You can use split by taking groups of at least chuncksize rows with cumsum and modulo:

n = chuncksize
lst = split(df, cumsum((1:nrow(df)-1)%%n==0))

lapply(lst, function(df_)
{
    #some code on df_
})

Example:

df = data.frame(col1=letters[1:10])
n = 3  #you want small dataframes of 3 rows

#> split(df, cumsum(1:nrow(df)%%n==0))
#$`1`
#  col1
#1    a
#2    b
#3    c

#$`2`
#  col1
#4    d
#5    e
#6    f

#$`3`
#  col1
#7    g
#8    h
#9    i

#$`4`
#   col1
#10    j
like image 5
Colonel Beauvel Avatar answered Nov 05 '22 04:11

Colonel Beauvel