Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas split one dataframe into multiple dataframes

Tags:

pandas

I have one pandas dataframe that I need to split into multiple dataframes. The number of dataframes I need to split depends on how many months of data I have i.e I need to create a new dataframe for every month. So df:

MONTH   NAME INCOME
201801   A     100$
201801   B      20$
201802   A      30$

So I need to create 2 dataframes . Problem is i dont know how many months of data I will have in advance. How do i do that

like image 241
Victor Avatar asked Jan 04 '19 21:01

Victor


People also ask

How do you split a DataFrame in Python?

div() method divides element-wise division of one pandas DataFrame by another. DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/).

How do you divide a DataFrame into 10 equal parts?

You could use split() , with rep() to create the groupings. How will I write a code such that it iteratively saves each of the 10 chunks as a csv file each with a unique filename? The x and each arguments are flippled if the goal is to split the df into n parts.


2 Answers

You can use groupby to create a dictionary of data frames,

df['MONTH'] = pd.to_datetime(df['MONTH'], format = '%Y%m')
dfs = dict(tuple(df.groupby(df['MONTH'].dt.month)))
dfs[1]


    MONTH   NAME    INCOME
0   2018-01-01  A   100$
1   2018-01-01  B   20$

If your data is across multiple years, you will need to include year in the grouping

dfs = dict(tuple(df.groupby([df['MONTH'].dt.year,df['MONTH'].dt.month])))
dfs[(2018, 1)]

    MONTH      NAME INCOME
0   2018-01-01  A   100$
1   2018-01-01  B   20$
like image 58
Vaishali Avatar answered Sep 25 '22 09:09

Vaishali


You can use groupby to split dataframes in to list of dataframes or a dictionary of datframes:

Dictionary of dataframes:

dict_of_dfs = {}
for n, g in df.groupby(df['MONTH']):
    dict_of_dfs[n] = g

List of dataframes:

list_of_dfs = []
for _, g in df.groupby(df['MONTH']):
    list_of_dfs.append(g)

Or as @BenMares suggests use comprehension:

dict_of_dfs = {

    month: group_df 

    for month, group_df in df.groupby('MONTH') 

}


list_of_dfs = [

    group_df 

    for _, group_df in df.groupby('MONTH')

]
like image 27
Scott Boston Avatar answered Sep 22 '22 09:09

Scott Boston