I have a pandas dataframe with the following strcuture:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(32).reshape((4,8)),
index = pd.date_range('2016-01-01', periods=4),
columns=['male ; 0', 'male ; 1','male ; 2','male ; 4','female ; 0','female ; 1','female ; 2','female ; 3',])
The column names are messy with a combination of two variable in the header name, and residual punctuation from the original spreadsheet.
What I want to do is set a column MultiIndex called sex and age in my dataframe.
I tried using pd.MultiIndex.from_tuples
like this:
columns = [('Male', 0),('Male', 1),('Male', 2),('Male', 3),('Female', 0),('Female', 1),('Female', 2),('Female', 3)]
df.columns = pd.MultiIndex.from_tuples(columns)
And then naming the column indexes:
df.columns.names = ['Sex', 'Age']
This gives the result that I would like. However, my dataframes has ages to over 100 for each sex so this is not very practical.
Could someone please guide me on how to set MultiIndex columns from a tuple programatically.
Jaco's answer works nicely, but you can even create a MultiIndex
from a product directly using .from_product()
:
sex = ['Male', 'Female']
age = range(100)
df.columns = pd.MultiIndex.from_product([sex, age], names=['Sex', 'Age'])
You can use the itertools
module to generate your columns
variable by taking the cartesian join of gender and the age range in your data, for example:
import itertools
max_age = 100
sex = ['Male','Female']
age = range(max_age)
columns=list(itertools.product(sex, age))
df.columns = pd.MultiIndex.from_tuples(columns)
df.columns.names = ['Sex', 'Age']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With