How to get a random (bootstrap) sample from pandas multiindex

Tags:

I'm trying to create a bootstrapped sample from a multiindex dataframe in Pandas. Below is some code to generate the kind of data I need.

from itertools import product
import pandas as pd
import numpy as np

df = pd.DataFrame({'group1': [1, 1, 1, 2, 2, 3],
                       'group2': [13, 18, 20, 77, 109, 123],
                       'value1': [1.1, 2, 3, 4, 5, 6],
                       'value2': [7.1, 8, 9, 10, 11, 12]
                       })
df = df.set_index(['group1', 'group2'])

print df

The df dataframe looks like:

                   value1  value2
group1 group2                
1      13         1.1     7.1
       18         2.0     8.0
       20         3.0     9.0
2      77         4.0    10.0
       109        5.0    11.0
3      123        6.0    12.0

I want to get a random sample from the first index. For example let's say the random values np.random.randint(3,size=3) produces [3,2,2]. I'd like the resultant dataframe to look like:

                   value1  value2
group1 group2                
3      123        6.0    12.0
2      77         4.0    10.0
       109        5.0    11.0
2      77         4.0    10.0
       109        5.0    11.0

I've spent a lot of time researching this and I've been unable to find a similar example where the multiindex values are integers, the secondary index is of variable length, and the primary index samples are repeating. This is how I think an appropriate implementation for bootstrapping would work.

423

asked Aug 02 '16 23:08

Chris

1 Answers

Try:

df.unstack().sample(3, replace=True).stack()

enter image description here

187

answered Nov 15 '22 09:11

piRSquared

Related questions
                            
                                In Tensorflow, how to unravel the flattened indices obtained by tf.nn.max_pool_with_argmax?
                            
                                Watching generation lists during a program run
                            
                                python libclang bindings on Windows fail to initialize a translation unit from sublime text
                            
                                How to extract data from SQL query and assign it to Odoo class columns?
                            
                                How to identify non-printable KeyPress events in Tkinter
                            
                                How to efficiently get the correlation matrix (with p-values) of a data frame with NaN values?
                            
                                How to quickly calculate cosine similarity for large number of vectors in Python?
                            
                                how to vectorise Pandas calculation that is based on last x rows of data
                            
                                Matplotlib Line3DCollection multicolored line edges are "jagged"
                            
                                How to Set spark.sql.parquet.output.committer.class in pyspark
                            
                                flake8 not honoring global configuration. elpy
                            
                                Pandas DatetimeIndex from MongoDB ISODate
                            
                                Pyinstaller- python exe stopped working: "Cannot open self"
                            
                                Why hash function on two different objects return same value?
                            
                                Connect to DynamoDB Local from inside docker container with boto3
                            
                                django float or decimal are rounded unintentionally when saving
                            
                                semantic segmentation with tensorflow - ValueError in loss function (sparse-softmax)
                            
                                Python: Not all environment variables present in os.environ
                            
                                How to install graph-tool for Anaconda Python 3.5 on linux-64?
                            
                                line (travel path) clustering machine learning algorithm [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get a random (bootstrap) sample from pandas multiindex

Tags:

python

pandas

multi-index

sampling

Chris

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us