Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting column values based on values in other columns for Pandas dataframes

I'm trying to count the number of each category of storm for each unique x and y combination. For example. My dataframe looks like:

x   y  year  Category
1   1  1988     3
2   1  1977     1
2   1  1999     2
3   2  1990     4

I want to create a dataframe that looks like:

x   y   Category 1   Category 2   Category 3  Category 4
1   1        0           0            1           0
2   1        1           1            0           0
3   2        0           0            0           1

I have tried various combinations of .groupby() and .count(), but I am still not getting the desired result. The closet thing I could get is:

df[['x','y','Category']].groupby(['Category']).count()

However, the result counts for all x and y, not the unique pairs:

Cat       x           y     
1       3773         3773
2       1230         1230
3       604          604
4       266          266
5       50           50
NA      27620        27620
TS      16884        16884

Does anyone know how to do a count operation on one column based on the uniqueness of two other columns in a dataframe?

like image 526
Lindsey Nield Avatar asked Feb 05 '19 02:02

Lindsey Nield


People also ask

How do I count values in one column based on another column in Pandas?

Use pandas. DataFrame. query() to get a column value based on another column.

How do I count the number of specific values in a Pandas column?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

How do I count values in multiple columns in Pandas?

In order to get the count of unique values on multiple columns use pandas DataFrame. drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.


Video Answer


1 Answers

pivot_table sounds like what you want. A bit of a hack is to add a column of 1's to use to count. This allows pivot_table to add 1 for each occurrence of a particular x-y and Category combination. You will set this new column as your value parameter in pivot_table and the aggfunc paraemter to np.sum. You'll probably want to set fill_value to 0 as well:

df['count'] = 1
result = df.pivot_table(
    index=['x', 'y'], columns='Category', values='count',
    fill_value=0, aggfunc=np.sum
)

result:

Category  1  2  3  4
x y                 
1 1       0  0  1  0
2 1       1  1  0  0
3 2       0  0  0  1

If you're interested in keeping x and y as columns and having the other column names as Category X, you can rename the columns and use reset_index:

result.columns = [f'Category {x}' for x in result.columns]
result = a.reset_index()
like image 159
busybear Avatar answered Sep 29 '22 03:09

busybear