I used pandas df.value_counts() to find the number of occurrences of particular brands. I want to merge those value counts with the respective brands in the initial dataframe.
df has many columns including one named 'brands'
brands = df.brands.value_counts()
brand1 143
brand2 21
brand3 101
etc.
How do I merge the value counts with the original dataframe such that each brand's corresponding count is in a new column, say "brand_count"?
Is it possible to assign headers to these columns; the names function won't work with series and I was unable to convert it to a dataframe to possibly merge the data that way. But, value_counts outputs a Series of dtype int64 (brand names should be type string) which means I cannot do the following:
df2 = pd.DataFrame({'brands': list(brands_all[0]), "brand_count":
list(brands_all[1])})
(merge with df)
Ultimately, I want to obtain this:
col1 col2 col3 brands brand_count ... col150
A 30
C 140
A 30
B 111
value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.
i think the best way is to use map
df['brand_count']= df.brand.map(df.brand.value_counts())
this is so much faster than groupby method for example (factor 500 on a 15000 row df) and take only one line
You want to use transform
.
import numpy as np
import pandas as pd
np.random.seed(0)
# Create dummy data.
df = pd.DataFrame({'brands': ['brand{0}'.format(n)
for n in np.random.random_integers(0, 5, 10)]})
df['brand_count'] = \
df.groupby('brands', as_index=False)['brands'].transform(lambda s: s.count())
>>> df
brands brand_count
0 brand4 1
1 brand5 2
2 brand0 1
3 brand3 4
4 brand3 4
5 brand3 4
6 brand1 1
7 brand3 4
8 brand5 2
9 brand2 1
For reference:
>>> df.brands.value_counts()
brand3 4
brand5 2
brand4 1
brand0 1
brand1 1
brand2 1
Name: brands, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With