I'm trying to figure out how to count by number of rows per unique pair of columns (ip, useragent), e.g.
d = pd.DataFrame({'ip': ['192.168.0.1', '192.168.0.1', '192.168.0.1', '192.168.0.2'], 'useragent': ['a', 'a', 'b', 'b']}) ip useragent 0 192.168.0.1 a 1 192.168.0.1 a 2 192.168.0.1 b 3 192.168.0.2 b
To produce:
ip useragent 192.168.0.1 a 2 192.168.0.1 b 1 192.168.0.2 b 1
Ideas?
In order to get the count of unique values on multiple columns use pandas DataFrame. drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.
You can use the nunique() function to count the number of unique values in a pandas DataFrame.
1. Count of unique values in each column. Using the pandas dataframe nunique() function with default parameters gives a count of all the distinct values in each column. In the above example, the nunique() function returns a pandas Series with counts of distinct values in each column.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
If you use groupby, you will get what you want.
d.groupby(['ip', 'useragent']).size()
produces:
ip useragent 192.168.0.1 a 2 b 1 192.168.0.2 b 1
print(d.groupby(['ip', 'useragent']).size().reset_index().rename(columns={0:''}))
gives:
ip useragent 0 192.168.0.1 a 2 1 192.168.0.1 b 1 2 192.168.0.2 b 1
Another nice option might be pandas.crosstab:
print(pd.crosstab(d.ip, d.useragent) ) print('\nsome cosmetics:') print(pd.crosstab(d.ip, d.useragent).reset_index().rename_axis('',axis='columns') )
gives:
useragent a b ip 192.168.0.1 2 1 192.168.0.2 0 1 some cosmetics: ip a b 0 192.168.0.1 2 1 1 192.168.0.2 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With