Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count by unique pair of columns in pandas [duplicate]

Tags:

python

pandas

I'm trying to figure out how to count by number of rows per unique pair of columns (ip, useragent), e.g.

d = pd.DataFrame({'ip': ['192.168.0.1', '192.168.0.1', '192.168.0.1', '192.168.0.2'], 'useragent': ['a', 'a', 'b', 'b']})       ip              useragent 0    192.168.0.1     a 1    192.168.0.1     a 2    192.168.0.1     b 3    192.168.0.2     b 

To produce:

ip           useragent   192.168.0.1  a           2 192.168.0.1  b           1 192.168.0.2  b           1 

Ideas?

like image 586
barnybug Avatar asked Dec 01 '12 13:12

barnybug


People also ask

How do I get unique counts of two columns in pandas?

In order to get the count of unique values on multiple columns use pandas DataFrame. drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.

How do you count unique occurrences in pandas?

You can use the nunique() function to count the number of unique values in a pandas DataFrame.

How do I count the number of unique values in a column?

1. Count of unique values in each column. Using the pandas dataframe nunique() function with default parameters gives a count of all the distinct values in each column. In the above example, the nunique() function returns a pandas Series with counts of distinct values in each column.

How do I count the number of occurrences in a column in pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.


2 Answers

If you use groupby, you will get what you want.

d.groupby(['ip', 'useragent']).size() 

produces:

ip          useragent                192.168.0.1 a           2             b           1 192.168.0.2 b           1 
like image 96
Matti John Avatar answered Sep 22 '22 17:09

Matti John


print(d.groupby(['ip', 'useragent']).size().reset_index().rename(columns={0:''})) 

gives:

            ip useragent    0  192.168.0.1         a  2 1  192.168.0.1         b  1 2  192.168.0.2         b  1 

Another nice option might be pandas.crosstab:

print(pd.crosstab(d.ip, d.useragent) ) print('\nsome cosmetics:') print(pd.crosstab(d.ip, d.useragent).reset_index().rename_axis('',axis='columns') ) 

gives:

useragent    a  b ip                192.168.0.1  2  1 192.168.0.2  0  1  some cosmetics:             ip  a  b 0  192.168.0.1  2  1 1  192.168.0.2  0  1 
like image 38
Markus Dutschke Avatar answered Sep 22 '22 17:09

Markus Dutschke