Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split dataframe by rows and generate list of dataframes in python

I have a dataframe:

data = {'Timestep'      : [0,1,2,0,1,2,3,0,1],
        'Price'           : [5,7,3,5,7,10,8,4,8],
        'Time Remaining' : [10.0,10.0,10.0,15.0,15.0,15.0,15.0,12.0,12.0]}
df = pd.DataFrame(data, columns = ['Timestep','Price','Time Remaining'])

Dataframe

I would like to transform the dataframe into a list with multiplie dataframes, where each timestep-sequence (0-2,0-3,0-1) is one dataframe. Furhtermore, I want the timesteps to be the indices in each dataset. It should look like this in the end:

list with multiple dataframes

I have a dataframe with thousands of rows and irregular sequences, so I guess I have to iterate through the rows.

Does anyone know how I can approach this problem?

like image 778
Elena Avatar asked Oct 03 '19 09:10

Elena


People also ask

How do you split a DataFrame into multiple Dataframes based on rows?

We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.

How do you split a DataFrame in equal parts in Python?

Be aware that np. array_split(df, 3) splits the dataframe into 3 sub-dataframes, while the split_dataframe function defined in @elixir's answer, when called as split_dataframe(df, chunk_size=3) , splits the dataframe every chunk_size rows. Hope I'm right, and that this is useful.

Can you have a list of pandas Dataframes?

The pandas DataFrame can be created by using the list of lists, to do this we need to pass a python list of lists as a parameter to the pandas. DataFrame() function. Pandas DataFrame will represent the data in a tabular format, like rows and columns.


1 Answers

From what I understood - you need a new DataFrame whenever your Timestep hits 0 -

This is something you can try

#This will give you the location of all zeros [0, 3, 7]
zero_indices = list(df.loc[df.Timestep == 0].index)
#We append the number of rows to this to get the last dataframe [0, 3, 7, 9]
zero_indices.append(len(df))
#Then we get the ranges - tuples of consecutive entries in the above list [(0, 3), (3, 7), (7, 9)]
zero_ranges = [(zero_indices[i], zero_indices[i+1]) for i in range(len(zero_indices) - 1)]
#And then we extract the dataframes into a list
list_of_dfs = [df.loc[x[0]:x[1] - 1].copy(deep=True) for x in zero_ranges]
like image 145
Mortz Avatar answered Sep 21 '22 23:09

Mortz