Count number of times each item in list occurs in a pandas dataframe column with comma separates vales

Tags:

I have a list :

citylist = ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Miami']

and a pandas Dataframe df1 with these values

Click to copy

first   last            city                                email
John    Travis          New York                            a@email.com
Jim     Perterson       San Franciso, Los Angeles           b@email.com
Nancy   Travis          Chicago                             b1@email.com
Jake    Templeton       Los Angeles                         b3@email.com
John    Myers           New York                            b4@email.com
Peter   Johnson         San Franciso, Chicago               b5@email.com
Aby     Peters          Los Angeles                         b6@email.com
Amy     Thomas          San Franciso                        b7@email.com
Jessica Thompson        Los Angeles, Chicago, New York      b8@email.com

I want to count the number of times each city from citylist occurs in the dataframe column 'city':

Click to copy

New York        3       
San Francisco   3
Los Angeles     4
Chicago         3
Miami           0

Currently I have

Click to copy

dftest = df1.groupby(by='city', as_index=False).agg({'id': pd.Series.nunique})

and it ends counting "Los Angeles, Chicago, New York" as 1 unique value

Is there any way to get counts as I have show above? Thanks

810

asked Sep 26 '20 15:09

user14262559

1 Answers

Try this:

Fix data first:

Click to copy

df1['city'] = df1['city'].str.replace('Franciso', 'Francisco')

Use this:

Click to copy

(df1['city'].str.split(', ')
            .explode()
            .value_counts(sort=False)
            .reindex(citylist, fill_value=0))

Output:

Click to copy

New York         3
San Francisco    3
Los Angeles      4
Chicago          3
Miami            0
Name: city, dtype: int64

answered Sep 25 '22 14:09

Scott Boston

Related questions
                            
                                Pandas merge on variable columns
                            
                                How to optimize such codes as follows in python?
                            
                                Get column names with distinct value greater than specified values python
                            
                                PySpark 2.4.5: IllegalArgumentException when using PandasUDF
                            
                                Pandas create zip file from ExcelWriter
                            
                                Split Pandas Dataframe Column According To a Value
                            
                                Seaborn barplot with rounded corners
                            
                                Pandas apply, rolling, groupby with multiple input & multiple output columns
                            
                                Pandas equivalent of dplyr everything()
                            
                                Why does pandas use "NaN" from numpy, instead of its own null value?
                            
                                Large XML File Parsing in Python
                            
                                Using fillna with two multi-index dataframes throws InvalidIndexError
                            
                                Round near 0.05 remove one digit from the results
                            
                                Unable to parse string at position 0 problem
                            
                                Send a pandas dataframe to slack
                            
                                Extracting data from list in Python, after BeautifulSoup scrape, and creating Pandas table
                            
                                Pandas in df column extract string after colon if colon exits; if not, keep text
                            
                                Expand pandas dataframe and consolidate columns
                            
                                Pandas dataframe groupby make a list or array of a column
                            
                                Pandas melt multiple columns to tabulate a dataset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count number of times each item in list occurs in a pandas dataframe column with comma separates vales

Tags:

pandas

dataframe

csv

grouping

user14262559

People also ask

1 Answers

Scott Boston

Recent Activity

Donate For Us