I know this must have been answered some where but I just could not find it. Problem: Sample each group after groupby operation. <pre class="prettyprint lang-py prettyprint-override"><code>import pandas as pd df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [1,1,1,0,0,0,0]}) grouped = df.groupby('b') # now sample from each group, e.g., I want 30% of each group </code></pre>

Apply a lambda and call <code>sample</code> with param <code>frac</code>: <pre class="prettyprint"><code>In [2]: df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [1,1,1,0,0,0,0]}) grouped = df.groupby('b') grouped.apply(lambda x: x.sample(frac=0.3)) Out[2]: a b b 0 6 7 0 1 2 3 1 </code></pre>

Sample each group after pandas groupby

Tags:

python

random

pandas

group-by

pandas-groupby

I know this must have been answered some where but I just could not find it.

Problem: Sample each group after groupby operation.

import pandas as pd  df = pd.DataFrame({'a': [1,2,3,4,5,6,7],                    'b': [1,1,1,0,0,0,0]})  grouped = df.groupby('b')  # now sample from each group, e.g., I want 30% of each group

884

asked Apr 03 '16 19:04

gongzhitaao

1 Answers

Apply a lambda and call sample with param frac:

In [2]: df = pd.DataFrame({'a': [1,2,3,4,5,6,7],                    'b': [1,1,1,0,0,0,0]})  grouped = df.groupby('b') grouped.apply(lambda x: x.sample(frac=0.3))  Out[2]:      a  b b         0 6  7  0 1 2  3  1

answered Sep 20 '22 12:09

EdChum

Related questions
                            
                                compiling vim with python support
                            
                                Case insensitive unique model fields in Django?
                            
                                How do I log from my Python Spark script
                            
                                Django: dependencies reference nonexistent parent node
                            
                                Iterate through dictionary values?
                            
                                Why is Django throwing error "DisallowedHost at /"?
                            
                                Getting file path of imported module [duplicate]
                            
                                How to get the last exception object after an error is raised at a Python prompt?
                            
                                Using Alembic API from inside application code
                            
                                How to setup a pipenv Python 3.6 project if OS Python version is 3.5?
                            
                                Are multiple classes in a single file recommended? [duplicate]
                            
                                How do nested functions work in Python?
                            
                                Get particular row as series from pandas dataframe
                            
                                finding elements by attribute with lxml
                            
                                Support multiple API versions in flask
                            
                                Dynamic loading of python modules
                            
                                How to clone a Python generator object?
                            
                                How do you stop numpy from multithreading? [duplicate]
                            
                                Pandas - make a column dtype object or Factor
                            
                                unittest.mock: asserting partial match for method argument

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With