I'm trying to create a bootstrapped sample from a multiindex dataframe in Pandas. Below is some code to generate the kind of data I need.
from itertools import product
import pandas as pd
import numpy as np
df = pd.DataFrame({'group1': [1, 1, 1, 2, 2, 3],
'group2': [13, 18, 20, 77, 109, 123],
'value1': [1.1, 2, 3, 4, 5, 6],
'value2': [7.1, 8, 9, 10, 11, 12]
})
df = df.set_index(['group1', 'group2'])
print df
The df dataframe looks like:
value1 value2
group1 group2
1 13 1.1 7.1
18 2.0 8.0
20 3.0 9.0
2 77 4.0 10.0
109 5.0 11.0
3 123 6.0 12.0
I want to get a random sample from the first index. For example let's say the random values np.random.randint(3,size=3)
produces [3,2,2]. I'd like the resultant dataframe to look like:
value1 value2
group1 group2
3 123 6.0 12.0
2 77 4.0 10.0
109 5.0 11.0
2 77 4.0 10.0
109 5.0 11.0
I've spent a lot of time researching this and I've been unable to find a similar example where the multiindex values are integers, the secondary index is of variable length, and the primary index samples are repeating. This is how I think an appropriate implementation for bootstrapping would work.
The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.
Try:
df.unstack().sample(3, replace=True).stack()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With