I'm looking for some way to iterate over chunks in R, but right now I'm having to add an additional statement at the end to capture the remainder if the number of items does not divide evenly into chunksize. For example:
for (i in 1:(nrow(dataframe)/chunksize)){
(do something with chunk)
}
remainder <- nrow(dataframe) %% chunksize
(do something with dataframe[(length(dataframe)-remainder):length(dataframe),])
Is there a more elegant way to do this? I'm assuming this type of operation is done very often in other code.
If you rly want to keep the for
construct:
chunk_size <- 7
for (i in seq(1, nrow(mtcars), chunk_size)) {
seq_size <- chunk_size
if ((i + seq_size) > nrow(mtcars)) seq_size <- nrow(mtcars) - i + 1
cat(i, seq_size, "\n")
}
1 7
8 7
15 7
22 7
29 4
You can use that to work on the indices you need to.
Here's one w/o the if
:
chunk_size <- 7
chunks <- ggplot2::cut_interval(1:nrow(mtcars), length=chunk_size, labels=FALSE)
for (i in unique(chunks)) {
print(nrow(mtcars[which(chunks==i),]))
}
You can use split
by taking groups of at least chuncksize
rows with cumsum
and modulo
:
n = chuncksize
lst = split(df, cumsum((1:nrow(df)-1)%%n==0))
lapply(lst, function(df_)
{
#some code on df_
})
Example:
df = data.frame(col1=letters[1:10])
n = 3 #you want small dataframes of 3 rows
#> split(df, cumsum(1:nrow(df)%%n==0))
#$`1`
# col1
#1 a
#2 b
#3 c
#$`2`
# col1
#4 d
#5 e
#6 f
#$`3`
# col1
#7 g
#8 h
#9 i
#$`4`
# col1
#10 j
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With