Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting non zero values in each column of a dataframe in python

I have a python-pandas-DataFrame in which first column is "user_id" and rest of the columns are tags("Tag_0" to "Tag_122").

I have the data in the following format:

UserId  Tag_0   Tag_1 7867688 0   5 7867688 0   3 7867688 3   0 7867688 3.5 3.5 7867688 4   4 7867688 3.5 0 

My aim is to achieve Sum(Tag)/Count(NonZero(Tags)) for each user_id

df.groupby('user_id').sum(), gives me sum(tag), however I am clueless about counting non zero values

Is it possible to achieve Sum(Tag)/Count(NonZero(Tags)) in one command?

In MySQL I could achieve this as follows:-

select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1 

Any help shall be appreciated.

like image 788
Harsh Singal Avatar asked Sep 26 '14 07:09

Harsh Singal


People also ask

How do you count values in a column in a DataFrame Python?

To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.

How do you find the number of non empty numbers in a column of a DataFrame?

Using Count() You can count the number of rows in the dataframe using the count() function as well. count() will not count the NA or missing values. Hence, you can use this when you want to count only the columns with non-empty values.


1 Answers

My favorite way of getting number of nonzeros in each column is

df.astype(bool).sum(axis=0) 

For the number of non-zeros in each row use

df.astype(bool).sum(axis=1) 

(Thanks to Skulas)

If you have nans in your df you should make these zero first, otherwise they will be counted as 1.

df.fillna(0).astype(bool).sum(axis=1) 

(Thanks to SirC)

like image 69
The Unfun Cat Avatar answered Sep 18 '22 02:09

The Unfun Cat