Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a pandas Series that represents frequencies of a value, how can I turn those frequencies into percentages?

Tags:

python

pandas

I was experimenting with the kaggle.com Titanic data set (data on every person on the Titanic) and came up with a gender breakdown like this:

df = pd.DataFrame({'sex': ['male'] * 577 + ['female'] * 314}) gender = df.sex.value_counts() gender  male   577 female 314  

I would like to find out the percentage of each gender on the Titanic.

My approach is slightly less than ideal:

from __future__ import division pcts = gender / gender.sum() pcts  male      0.647587 female    0.352413 

Is there a better (more idiomatic) way?

like image 272
Tim Stewart Avatar asked Jan 11 '13 15:01

Tim Stewart


People also ask

How do you get percentage in pandas?

You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.

How are pandas frequencies calculated?

In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.

How do I convert a count to a percent in Python?

To calculate a percentage in Python, use the division operator (/) to get the quotient from two numbers and then multiply this quotient by 100 using the multiplication operator (*) to get the percentage.

What is freq in pandas?

dt can be used to access the values of the series as datetimelike and return several properties. Pandas Series. dt. freq attribute return the time series frequency applied on the given series object if any, else it return None.


1 Answers

This function is implemented in pandas, actually even in value_counts(). No need to calculate :)

just type:

df.sex.value_counts(normalize=True) 

which gives exactly the desired output.

Please note that value_counts() excludes NA values, so numbers might not add up to 1. See here: http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.Series.value_counts.html (A column of a DataFrame is a Series)

like image 130
fanfabbb Avatar answered Oct 09 '22 22:10

fanfabbb