I was experimenting with the kaggle.com Titanic data set (data on every person on the Titanic) and came up with a gender breakdown like this:
df = pd.DataFrame({'sex': ['male'] * 577 + ['female'] * 314}) gender = df.sex.value_counts() gender male 577 female 314
I would like to find out the percentage of each gender on the Titanic.
My approach is slightly less than ideal:
from __future__ import division pcts = gender / gender.sum() pcts male 0.647587 female 0.352413
Is there a better (more idiomatic) way?
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.
To calculate a percentage in Python, use the division operator (/) to get the quotient from two numbers and then multiply this quotient by 100 using the multiplication operator (*) to get the percentage.
dt can be used to access the values of the series as datetimelike and return several properties. Pandas Series. dt. freq attribute return the time series frequency applied on the given series object if any, else it return None.
This function is implemented in pandas, actually even in value_counts(). No need to calculate :)
just type:
df.sex.value_counts(normalize=True)
which gives exactly the desired output.
Please note that value_counts() excludes NA values, so numbers might not add up to 1. See here: http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.Series.value_counts.html (A column of a DataFrame is a Series)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With