Pandas groupby and value_counts

Tags:

I want to count distinct values per column (with pd.value_counts I guess) grouping data by some level in MultiIndex. The multiindex is taken care of with groupby(level= parameter, but apply raises a ValueError

Original dataframe:

>>> df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)),
                 columns=['c1','c2','c3','c4','c5'], 
                 index=pd.MultiIndex.from_product([['foo', 'bar'], 
                                                   ['w','y','x','y','z']]))



      c1 c2 c3 c4 c5
foo w  C  C  B  A  A
    y  A  A  C  B  A
    x  A  B  C  C  C
    y  A  B  C  C  C
    z  A  C  B  C  B
bar w  B  C  C  A  C
    y  A  A  C  A  A
    x  A  B  B  B  A
    y  A  A  C  A  B
    z  A  B  B  C  B

What I want:

       c1  c2  c3  c4  c5
foo A   4   2   0   3   2
    B   1   2   2   1   2
    C   0   1   3   1   1
bar A   4   1   0   1   2
    B   0   2   2   1   1
    C   1   2   3   3   2

I try to do:

>>> df.groupby(level=0).apply(pd.value_counts)

ValueError: could not broadcast input array from shape (5,5) into shape (5)

I can do it myself manually, but I think it must be a more obvious way.

groups = [g.apply(pd.value_counts).fillna(0) for n, g in df.groupby(level=0)]
index = df.index.get_level_values(0).unique()
correct_result = pd.concat(groups, keys=index)   # THIS WORKS AS EXPECTED

I mean, this isn't that long to write, but I feel like I'm reinventing the wheel. Aren't this kind of operations done by groupby function?

Is there a more straightforward way of doing this, other than doing the split-apply-combine myself?

378

asked Aug 11 '18 12:08

Susensio

1 Answers

Use stack for MultiIndex Series, then SeriesGroupBy.value_counts and last unstack for DataFrame:

np.random.seed(123)

df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)),
                 columns=['c1','c2','c3','c4','c5'], 
                 index=pd.MultiIndex.from_product([['foo', 'bar'], 
                                                   ['w','y','x','y','z']]))
print (df)
      c1 c2 c3 c4 c5
foo w  C  B  C  C  A
    y  C  C  B  C  B
    x  C  B  A  B  C
    y  B  A  C  A  B
    z  C  B  A  A  A
bar w  A  B  C  A  C
    y  A  A  B  A  B
    x  A  A  A  C  B
    y  B  C  C  C  B
    z  A  A  C  B  A

df1 = df.stack().groupby(level=[0,2]).value_counts().unstack(1, fill_value=0)
print (df1)
       c1  c2  c3  c4  c5
bar A   4   3   1   2   1
    B   1   1   1   1   3
    C   0   1   3   2   1
foo A   0   1   2   2   2
    B   1   3   1   1   2
    C   4   1   2   2   1

answered Nov 15 '22 01:11

jezrael

Related questions
                            
                                Tweepy check if a tweet is a retweet
                            
                                Python pysftp get_r from Linux works fine on Linux but not on Windows
                            
                                Python - Matplotlib / matplotlib.cbook.TimeoutError: LOCKERROR
                            
                                Tensorflow: how to use pretrained weights in new graph?
                            
                                'jupyter notebook' command not working on Linux
                            
                                Split lists within dataframe column into multiple columns [duplicate]
                            
                                Missing required dependencies ['numpy'] in AWS Lambda after installing numpy into directory, how to fix?
                            
                                Specify options and arguments dynamically
                            
                                Filtering out rows with non-alphanumeric characters
                            
                                How use Connection in Fabric 2?
                            
                                Extract text between two different tags beautiful soup
                            
                                install GDAL in python 3.6
                            
                                What does epochs mean in Doc2Vec and train when I have to manually run the iteration?
                            
                                What are all the fields in a Python ntplib response, and how are they used?
                            
                                SQLAlchemy - How to access column names from ResultProxy and write to CSV headers
                            
                                Numpy Standard Deviation AttributeError: 'Float' object has no attribute 'sqrt'
                            
                                Why is there a semicolon ; after matplotlibs plot() function?
                            
                                Debugging not running on PyCharm for my Django project
                            
                                How to load R's .rdata files into Python?
                            
                                Filter dataframe rows by index name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas groupby and value_counts

Tags:

python

pandas

pandas-groupby

Susensio

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us