I'm currently trying to split a pandas dataframe into an unknown number of chunks containing each N rows.
I have tried using numpy.array_split() this funktion however splits the dataframe into N chunks containing an unknown number of rows.
Is there a clever way to split a python dataframe into multiple dataframes, each containing a specific number of rows from the parent dataframe
You can try this:
def rolling(df, window, step):
count = 0
df_length = len(df)
while count < (df_length -window):
yield count, df[count:window+count]
count += step
Usage:
for offset, window in rolling(df, 100, 100):
# | | | |
# | The current chunk. | How many rows to step at a time.
# The current offset index. How many rows in each chunk.
# your code here
pass
There is also this simpler idea:
def chunk(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
Usage:
for df_chunk in chunk(df, 100):
# |
# The chunk size
# your code here
BTW. All this can be found on SO, with a search.
calculate the index of splits :
size_of_chunks = 3
index_for_chunks = list(range(0, index.max(), size_of_chunks))
index_for_chunks.extend([index.max()+1])
use them to split the df :
dfs = {}
for i in range(len(index_for_chunks)-1):
dfs[i] = df.iloc[index_for_chunks[i]:index_for_chunks[i+1]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With