Here is an example of what I am trying to do:
import io
import pandas as pd
data = io.StringIO('''Fruit,Color,Count,Price
Apple,Red,3,$1.29
Apple,Green,9,$0.99
Pear,Red,25,$2.59
Pear,Green,26,$2.79
Lime,Green,99,$0.39
''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Fruit', 'Color'])
Output:
Out[5]:
Count Price
Fruit Color
Apple Red 3 $1.29
Green 9 $0.99
Pear Red 25 $2.59
Green 26 $2.79
Lime Green 99 $0.39
Now lets say I want to count the number of keys in the 'Color' level:
L = []
for i in pd.unique(df.index.get_level_values(0)):
L.append(range(df.xs(i).shape[0]))
list(np.concatenate(L))
Then I add the resulting list [0,1,0,1,0]
as a new column:
df['Bob'] = list(np.concatenate(L))
as so:
Count Price Bob
Fruit Color
Apple Red 3 $1.29 0
Green 9 $0.99 1
Pear Red 25 $2.59 0
Green 26 $2.79 1
Lime Green 99 $0.39 0
My question:
How do I make the Bob
column an index on the same level as Color
? This is what I want:
Count Price
Fruit Color Bob
Apple Red 0 3 $1.29
Green 1 9 $0.99
Pear Red 0 25 $2.59
Green 1 26 $2.79
Lime Green 0 99 $0.39
Are you looking for cumcount
? If so, you can ditch the loop and vectorize your solution.
df = df.set_index(df.groupby(level=0).cumcount(), append=True)
print(df)
Count Price
Fruit Color
Apple Red 0 3 $1.29
Green 1 9 $0.99
Pear Red 0 25 $2.59
Green 1 26 $2.79
Lime Green 0 99 $0.39
Or, if you'd prefer doing this in one fell swoop,
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Fruit', 'Color', df.groupby('Fruit').cumcount()])
print(df)
Count Price
Fruit Color
Apple Green 0 9 $0.99
Red 1 3 $1.29
Lime Green 0 99 $0.39
Pear Green 1 26 $2.79
Red 0 25 $2.59
To rename the index, use rename_axis
:
df = df.rename_axis(['Fruit', 'Color', 'Bob'])
print(df)
Count Price
Fruit Color Bob
Apple Red 0 3 $1.29
Green 1 9 $0.99
Pear Red 0 25 $2.59
Green 1 26 $2.79
Lime Green 0 99 $0.39
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With