I have one dataframe df
, with two columns : Script (with text) and Speaker
Script Speaker
aze Speaker 1
art Speaker 2
ghb Speaker 3
jka Speaker 1
tyc Speaker 1
avv Speaker 2
bhj Speaker 1
And I have the following list : L = ['a','b','c']
With the following code,
df = (df.set_index('Speaker')['Script'].str.findall('|'.join(L))
.str.join('|')
.str.get_dummies()
.sum(level=0))
print (df)
I obtain this dataframe df2
:
Speaker a b c
Speaker 1 2 1 1
Speaker 2 2 0 0
Speaker 3 0 1 0
Which line can I add in my code to obtain, for each line of my dataframe df2
, a percentage value of all lines spoken by speaker, in order to have the following dataframe df3
:
Speaker a b c
Speaker 1 50% 25% 25%
Speaker 2 100% 0 0
Speaker 3 0 100% 0
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
In this snippet we convert each the values in the dataframe to the percentage each value represent across the row of the dataframe. First we create a 'total' column for each row and then use pipe and lambda to divide each value in the row by the 'total' column and format as a percentage.
To calculate a percentage in Python, use the division operator (/) to get the quotient from two numbers and then multiply this quotient by 100 using the multiplication operator (*) to get the percentage. This is a simple equation in mathematics to get the percentage.
To find the percentage of missing values in each column of an R data frame, we can use colMeans function with is.na function. This will find the mean of missing values in each column. After that we can multiply the output with 100 to get the percentage.
You could divide by the sum
along the first axis and then cast to string and add %
:
out = (df.set_index('Speaker')['Script'].str.findall('|'.join(L))
.str.join('|')
.str.get_dummies()
.sum(level=0))
(out/out.sum(0)[:,None]).mul(100).astype(int).astype(str).add('%')
a b c
Speaker
Speaker1 50% 25% 25%
Speaker2 100% 0% 0%
Speaker3 0% 100% 0%
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With