Let's suppose I have a python data frame that looks something like this:
Factor_1 Factor_2 Factor_3 Factor_4 Factor_5
A B A Nan Nan
B D F A Nan
F A D B A
Something like this in which I have 5 columns that have different factors. I would like to create a column that counts how many of this factors appear in the dtaframe but without double counting in what terms without double counting if the value apperas in one row it only counts it as 1 for example if one row has A, B, C, A, A only 1 A would be counted. The expected out output would be this.
Factor Count
A 3
B 3
D 2
F 2
Nan 2
I used a a code I was helped with
df.stack(dropna=False).value_counts(dropna=False)
I was using an if to drop the double count but I would like to know if there is a practical and simple way to do this, like the code above, and not with an If because what I am doing is not efficient.
You can use Series.unique + Series.value_counts:
s = pd.Series(np.hstack(df.T.apply(pd.Series.unique))).value_counts(dropna=False)
B 3
A 3
F 2
D 2
NaN 2
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With