Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping and pivoting DataFrame with additional column for ratio of counts

I have a dataframe that looks like this:

id       status      year 
1        yes         2014
3        no          2013
2        yes         2014
4        no          2014

The actual dataframe is very large with multiple ids and years. I am trying to make a new dataframe that has the percents of 'yes's and 'no's grouped by year.

I was thinking of grouping the dataframe by the year, which would then put the statuses per year in a list and then analyzing the counts of yes's and no's that way, but I was wondering whether there is a more pythonic way to do this?

I would like for the end dataframe to look like this:

year      yes_count     no_count     ratio_yes_to_toal    
2013       0             1             0%
2014       2             1             67%
like image 449
Priya Avatar asked Dec 19 '18 16:12

Priya


People also ask

What is the difference between pivot and pivot table?

Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation. Here's an example.

What is the difference between pivot table and Groupby in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

What is pivot_table in pandas?

Pivot table in pandas is an excellent tool to summarize one or more numeric variable based on two other categorical variables. Pivot tables in pandas are popularly seen in MS Excel files. In python, Pivot tables of pandas dataframes can be created using the command: pandas. pivot_table .

How do you reshape a data frame?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.


1 Answers

I'd suggest grouping by year and status, counting, pivoting, and then creating an additional column of the ratio:

df2 = df.groupby(['year', 'status']).count().pivot_table(index="year", columns=["status"]).fillna(0)
df2.columns = df2.columns.get_level_values(1)
df2['ratio'] = df2['yes'] / (df2['yes'] + df2['no'])

Output

status   no  yes     ratio
year                      
2013    1.0  0.0  0.000000
2014    1.0  2.0  0.666667
like image 101
Tim Avatar answered Oct 26 '22 23:10

Tim