Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split DataFrame every x unique values into new Dataframes

I need to slice a long format DataFrame by every x unique values for the purpose of visualizing. My actual dataset has ~ 90 variables for 20 individuals so I would like to split into 9 separate df's containing the entries for all 20 individuals for each variable.

I have created this simple example to help explain:

df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3,3,4,4,4],
                'Period':[1,2,3,1,2,3,1,2,3,1,2,3,],
                'Food':['Ham','Ham','Ham','Cheese','Cheese','Cheese','Egg','Egg','Egg','Bacon','Bacon','Bacon',]})
df

''' ******* PSUEDOCODE *******
    df1 = unique entries [:2]
    df2 = unique entries [2:4] '''


# desired outcome:

df1 = pd.DataFrame({'ID':[1,1,1,2,2,2,],
                'Period':[1,2,3,1,2,3,],
                'Food':['Ham','Ham','Ham','Cheese','Cheese','Cheese',]})

df2 = pd.DataFrame({'ID':[3,3,3,4,4,4],
                'Period':[1,2,3,1,2,3,],
                'Food':['Egg','Egg','Egg','Bacon','Bacon','Bacon',]})

print(df1)
print(df2)

In this case, the DataFrame would be split at the end of every 2 sets of unique entries in the df['Food'] column to create df1 and df2. Best case scenario would be a loop that creates a new DataFrame for every x unique entries. Given the lack of info I can find, I'm unfortunately struggling to write even good pseudocode for that.

like image 791
John Conor Avatar asked Jun 11 '26 21:06

John Conor


2 Answers

Let us try with factorize and groupby

n = 2
d = {x : y for x , y in df.groupby(df.Food.factorize()[0]//n)}
d[0]
Out[132]: 
   ID  Period    Food
0   1       1     Ham
1   1       2     Ham
2   1       3     Ham
3   2       1  Cheese
4   2       2  Cheese
5   2       3  Cheese
d[1]
Out[133]: 
    ID  Period   Food
6    3       1    Egg
7    3       2    Egg
8    3       3    Egg
9    4       1  Bacon
10   4       2  Bacon
11   4       3  Bacon
like image 165
BENY Avatar answered Jun 13 '26 12:06

BENY


Possible solution is the following:

# pip install pandas

import pandas as pd

df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3,3,4,4,4],
                'Period':[1,2,3,1,2,3,1,2,3,1,2,3,],
                'Food':['Ham','Ham','Ham','Cheese','Cheese','Cheese','Egg','Egg','Egg','Bacon','Bacon','Bacon',]})

dfs = [y for x, y in df.groupby('Food', as_index=False)]

Separated dfs can be accessed by list index (see below) or using loop:

dfs[0]

enter image description here

dfs[1]

enter image description here

and etc.
like image 41
gremur Avatar answered Jun 13 '26 10:06

gremur



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!