I need to select each time N rows in a pandas Dataframe using iterrows. Something like this:
def func():
selected = []
for i in range(N):
selected.append(next(dataframe.iterrows()))
yield selected
But doing this selected has N equal elements. And each time I call func I have always the same result (the first element of the dataframe).
If the dataframe is:
A B C
0 5 8 2
1 1 2 3
2 4 5 6
3 7 8 9
4 0 1 2
5 3 4 5
6 7 8 6
7 1 2 3
What I want to obtain is:
N = 3
selected = [ [5,8,2], [1,2,3], [4,5,6] ]
then, calling again the function,
selected = [ [7,8,9], [0,1,2], [3,4,5] ]
then,
selected = [ [7,8,6], [1,2,3], [5,8,2] ]
No need for .iterrows(), rather use slicing:
def flow_from_df(dataframe: pd.DataFrame, chunk_size: int = 10):
for start_row in range(0, dataframe.shape[0], chunk_size):
end_row = min(start_row + chunk_size, dataframe.shape[0])
yield dataframe.iloc[start_row:end_row, :]
To use it:
get_chunk = flow_from_df(dataframe)
chunk1 = next(get_chunk)
chunk2 = next(get_chunk)
Or not using a generator:
def get_chunk(dataframe: pd.DataFrame, chunk_size: int, start_row: int = 0) -> pd.DataFrame:
end_row = min(start_row + chunk_size, dataframe.shape[0])
return dataframe.iloc[start_row:end_row, :]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With