I have a python-pandas-DataFrame in which first column is "user_id"
and rest of the columns are tags("Tag_0"
to "Tag_122"
).
I have the data in the following format:
UserId Tag_0 Tag_1 7867688 0 5 7867688 0 3 7867688 3 0 7867688 3.5 3.5 7867688 4 4 7867688 3.5 0
My aim is to achieve Sum(Tag)/Count(NonZero(Tags))
for each user_id
df.groupby('user_id').sum()
, gives me sum(tag)
, however I am clueless about counting non zero values
Is it possible to achieve Sum(Tag)/Count(NonZero(Tags))
in one command?
In MySQL I could achieve this as follows:-
select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1
Any help shall be appreciated.
To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.
Using Count() You can count the number of rows in the dataframe using the count() function as well. count() will not count the NA or missing values. Hence, you can use this when you want to count only the columns with non-empty values.
My favorite way of getting number of nonzeros in each column is
df.astype(bool).sum(axis=0)
For the number of non-zeros in each row use
df.astype(bool).sum(axis=1)
(Thanks to Skulas)
If you have nans in your df you should make these zero first, otherwise they will be counted as 1.
df.fillna(0).astype(bool).sum(axis=1)
(Thanks to SirC)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With