Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to expand certain columns to rows using Python pandas

Tags:

python

pandas

I have a table of two columns with a many-to-many relationship.

e.g.

Animal     Food
rabbit     grass
rabbit     carrots
rabbit     cabbage
dog        carrots
horse      grass
horse      hay

I want something like this:

Animal     Food1   Food2     Food3 
rabbit     grass   carrots   cabbage  
dog        carrots
horse      grass   hay

or something like this:

Animal     Grass     Carrots    Cabbage     Hay
rabbit      True      True       True       False
dog         False     True       False      False
horse       True      False      False      True

I've tried pivot table, melt and stack, but still cannot figure out how to do this. Any help will be appreciated. Thanks!!

like image 871
qhan Avatar asked Dec 06 '22 14:12

qhan


1 Answers

Option 1
groupby with size

df.groupby(['Animal', 'Food']).size().unstack(fill_value=0).astype(bool)

Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

s = df.groupby('Animal').Food.apply(list)
pd.DataFrame(s.values.tolist(), s.index).add_prefix('Food').fillna('')

          Food0    Food1    Food2
Animal                           
dog     carrots                  
horse     grass      hay         
rabbit    grass  carrots  cabbage

Option 2
groupby with value_counts

df.groupby('Animal').Food.value_counts().unstack(fill_value=0).astype(bool)

Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

Option 3
groupby and str.get_dummies

df.groupby('Animal').Food.apply('|'.join).str.get_dummies().astype(bool)

        cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

Option 4
pandas.factorize with numpy.bincount

f1, u1 = pd.factorize(df.Animal.values)
f2, u2 = pd.factorize(df.Food.values)

n = u1.size
m = u2.size

b = np.bincount(f1 * m + f2, minlength=n * m).reshape(n, m)

pd.DataFrame(b.astype(bool), u1, u2)

        grass  carrots  cabbage    hay
rabbit   True     True     True  False
dog     False     True    False  False
horse    True    False    False   True

Option 5
was bored... so came up with more

f, u = pd.factorize(df.Animal.values)
n = u.size

a = [[] for _ in range(n)]
[a[i].append(food) for i, food in zip(f, df.Food)];
pd.DataFrame(a, u).rename(columns=lambda x: x+1).add_prefix('Food').fillna('')

          Food1    Food2    Food3
rabbit    grass  carrots  cabbage
dog     carrots                  
horse     grass      hay         
like image 120
piRSquared Avatar answered Dec 08 '22 02:12

piRSquared