Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of times each item in list occurs in a pandas dataframe column with comma separates vales

I have a list :

citylist = ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Miami']

and a pandas Dataframe df1 with these values

first   last            city                                email
John    Travis          New York                            [email protected]
Jim     Perterson       San Franciso, Los Angeles           [email protected]
Nancy   Travis          Chicago                             [email protected]
Jake    Templeton       Los Angeles                         [email protected]
John    Myers           New York                            [email protected]
Peter   Johnson         San Franciso, Chicago               [email protected]
Aby     Peters          Los Angeles                         [email protected]
Amy     Thomas          San Franciso                        [email protected]
Jessica Thompson        Los Angeles, Chicago, New York      [email protected]

I want to count the number of times each city from citylist occurs in the dataframe column 'city':

New York        3       
San Francisco   3
Los Angeles     4
Chicago         3
Miami           0

Currently I have

dftest = df1.groupby(by='city', as_index=False).agg({'id': pd.Series.nunique})

and it ends counting "Los Angeles, Chicago, New York" as 1 unique value

Is there any way to get counts as I have show above? Thanks

like image 810
user14262559 Avatar asked Sep 26 '20 15:09

user14262559


People also ask

How do I count the number of occurrences in a column in pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do you count the number of repeated values in pandas?

You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .

What is the difference between count and Value_counts in pandas?

count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.

Can you write a program to count the number of rows and columns in a DataFrame?

columns represents columns. So, len(dataframe. index) and len(dataframe. columns) gives count of rows and columns respectively.


1 Answers

Try this:

Fix data first:

df1['city'] = df1['city'].str.replace('Franciso', 'Francisco')

Use this:

(df1['city'].str.split(', ')
            .explode()
            .value_counts(sort=False)
            .reindex(citylist, fill_value=0))

Output:

New York         3
San Francisco    3
Los Angeles      4
Chicago          3
Miami            0
Name: city, dtype: int64
like image 85
Scott Boston Avatar answered Sep 25 '22 14:09

Scott Boston