I know this must have been answered some where but I just could not find it.
Problem: Sample each group after groupby operation.
import pandas as pd df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [1,1,1,0,0,0,0]}) grouped = df.groupby('b') # now sample from each group, e.g., I want 30% of each group
By doing groupby() pandas returns you a dict of grouped DFs. You can easily get the key list of this dict by python built in function keys() .
Group DataFrame using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.
You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.
Apply a lambda and call sample
with param frac
:
In [2]: df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [1,1,1,0,0,0,0]}) grouped = df.groupby('b') grouped.apply(lambda x: x.sample(frac=0.3)) Out[2]: a b b 0 6 7 0 1 2 3 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With