How to merge pandas value_counts() to dataframe or use it to subset a dataframe

Tags:

pandas

I used pandas df.value_counts() to find the number of occurrences of particular brands. I want to merge those value counts with the respective brands in the initial dataframe.

 df has many columns including one named 'brands'
 brands = df.brands.value_counts()

 brand1   143
 brand2   21
 brand3   101
 etc.

How do I merge the value counts with the original dataframe such that each brand's corresponding count is in a new column, say "brand_count"?

Is it possible to assign headers to these columns; the names function won't work with series and I was unable to convert it to a dataframe to possibly merge the data that way. But, value_counts outputs a Series of dtype int64 (brand names should be type string) which means I cannot do the following:

 df2 = pd.DataFrame({'brands': list(brands_all[0]), "brand_count":
 list(brands_all[1])})
 (merge with df)

Ultimately, I want to obtain this:

 col1  col2  col3  brands  brand_count ... col150
                   A        30
                   C        140
                   A        30
                   B        111

434

asked Mar 05 '16 02:03

user2476665

2 Answers

i think the best way is to use map

df['brand_count']= df.brand.map(df.brand.value_counts())

this is so much faster than groupby method for example (factor 500 on a 15000 row df) and take only one line

answered Oct 21 '22 21:10

Egos

You want to use transform.

import numpy as np
import pandas as pd

np.random.seed(0)

# Create dummy data.
df = pd.DataFrame({'brands': ['brand{0}'.format(n) 
                   for n in np.random.random_integers(0, 5, 10)]})

df['brand_count'] = \
    df.groupby('brands', as_index=False)['brands'].transform(lambda s: s.count())

>>> df
   brands brand_count
0  brand4           1
1  brand5           2
2  brand0           1
3  brand3           4
4  brand3           4
5  brand3           4
6  brand1           1
7  brand3           4
8  brand5           2
9  brand2           1

For reference:

>>> df.brands.value_counts()
brand3    4
brand5    2
brand4    1
brand0    1
brand1    1
brand2    1
Name: brands, dtype: int64

answered Oct 21 '22 19:10

Alexander

Related questions
                            
                                Check if a directory is a (file system) root
                            
                                Downloading file with Python mechanize
                            
                                How to disable logging using Selenium with Python binding
                            
                                How can I attach a pyplot function to a figure instance?
                            
                                How to calculate moving average in Python 3?
                            
                                Python: access structure field through its name in a string
                            
                                python recursive function that prints from 0 to n?
                            
                                How can I combine range() functions
                            
                                3 Different issues with ttk treeviews in python
                            
                                Custom attributes for Flask WTForms
                            
                                Python List to PostgreSQL Array
                            
                                UnboundLocalError: local variable 'L' referenced before assignment Python [duplicate]
                            
                                What is the practical application of bool() in Python?
                            
                                TypeError at / __init__() takes exactly 1 argument (2 given)
                            
                                Python ValueError: No JSON object could be decoded
                            
                                How to add a background image into pygame?
                            
                                Get last three digits of an integer
                            
                                How do I do line continuation with a long regex? [duplicate]
                            
                                matplotlib - making labels for violin plots
                            
                                Can't run pip: UnicodeDecodeError

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With