It is pandas/Dataframe, it contains all scores for everyone everyday, I want to add one extra column to collect how many time it has the highest score (could be more than one people and some data are nan
)
import pandas as pd
import numpy as np
data = np.array([['','day1','day2','day3','day4','day5'],
['larry',1,4,7,3,5],
['niko',2,-1,3,6,4],
['tin',np.nan,5,5, 6,7]])
df = pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])
print(df)
output
day1 day2 day3 day4 day5
larry 1 4 7 3 5
niko 2 -1 3 6 4
tin nan 5 5 6 7
expected result is (larry: 1 time, niko: 2 times, tin: 3 times)
times_of_top day1 day2 day3 day4 day5
larry 1 1 4 7 3 5
niko 2 2 -1 3 6 4
tin 3 nan 5 5 6 7
niko
has the highest score on day1
and day4
so his times_of_top
is 2.tin
has the highest score on day2
, day4
and day5
so his times_of_top
is 3.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
To sum the number of times an element or number appears, Python's value_counts() function is used. The mode() method can then be used to get the most often occurring element.
In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.
One way using pandas.DataFrame.stack
and count
:
# df = df.astype(float)
# Since the sample data are in object type
df["times_of_top"] = df[df == df.max()].stack().count(0)
print(df)
Output:
day1 day2 day3 day4 day5 times_of_top
larry 1.0 4.0 7.0 3.0 5.0 1
niko 2.0 -1.0 3.0 6.0 4.0 2
tin NaN 5.0 5.0 6.0 7.0 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With