I have a python-pandas-DataFrame in which first column is <code>"user_id"</code> and rest of the columns are tags(<code>"Tag_0"</code> to <code>"Tag_122"</code>). I have the data in the following format: <pre class="prettyprint"><code>UserId Tag_0 Tag_1 7867688 0 5 7867688 0 3 7867688 3 0 7867688 3.5 3.5 7867688 4 4 7867688 3.5 0 </code></pre> My aim is to achieve <code>Sum(Tag)/Count(NonZero(Tags))</code> for each user_id <code>df.groupby('user_id').sum()</code>, gives me <code>sum(tag)</code>, however I am clueless about counting non zero values Is it possible to achieve <code>Sum(Tag)/Count(NonZero(Tags))</code> in one command? In MySQL I could achieve this as follows:- <pre class="prettyprint"><code>select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1 </code></pre> Any help shall be appreciated.

My favorite way of getting number of nonzeros in each column is <pre class="prettyprint"><code>df.astype(bool).sum(axis=0) </code></pre> For the number of non-zeros in each row use <pre class="prettyprint"><code>df.astype(bool).sum(axis=1) </code></pre> (Thanks to Skulas) If you have nans in your df you should make these zero first, otherwise they will be counted as 1. <pre class="prettyprint"><code>df.fillna(0).astype(bool).sum(axis=1) </code></pre> (Thanks to SirC)

Counting non zero values in each column of a dataframe in python

Tags:

python

pandas

dataframe

I have a python-pandas-DataFrame in which first column is "user_id" and rest of the columns are tags("Tag_0" to "Tag_122").

I have the data in the following format:

UserId  Tag_0   Tag_1 7867688 0   5 7867688 0   3 7867688 3   0 7867688 3.5 3.5 7867688 4   4 7867688 3.5 0

My aim is to achieve Sum(Tag)/Count(NonZero(Tags)) for each user_id

df.groupby('user_id').sum(), gives me sum(tag), however I am clueless about counting non zero values

Is it possible to achieve Sum(Tag)/Count(NonZero(Tags)) in one command?

In MySQL I could achieve this as follows:-

select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1

Any help shall be appreciated.

788

asked Sep 26 '14 07:09

Harsh Singal

1 Answers

My favorite way of getting number of nonzeros in each column is

df.astype(bool).sum(axis=0)

For the number of non-zeros in each row use

df.astype(bool).sum(axis=1)

(Thanks to Skulas)

If you have nans in your df you should make these zero first, otherwise they will be counted as 1.

df.fillna(0).astype(bool).sum(axis=1)

(Thanks to SirC)

answered Sep 18 '22 02:09

The Unfun Cat

Related questions
                            
                                Parsing time string in Python
                            
                                matplotlib bar graph black - how do I remove bar borders
                            
                                RFC 1123 Date Representation in Python?
                            
                                How do I call the Python's list while debugging?
                            
                                Roc curve and cut off point. Python
                            
                                Understanding torch.nn.Parameter
                            
                                Python 3 TypeError: must be str, not bytes with sys.stdout.write()
                            
                                What's the fastest way in Python to calculate cosine similarity given sparse matrix data?
                            
                                Is there a simple process-based parallel map for python?
                            
                                How to test with Python's unittest that a warning has been thrown?
                            
                                Insert line at middle of file with Python?
                            
                                Python read JSON file and modify
                            
                                How to draw a line with matplotlib?
                            
                                Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C
                            
                                Fitting a Normal distribution to 1D data
                            
                                Do I understand os.walk right?
                            
                                Execute Python script within Jupyter notebook using a specific virtualenv
                            
                                How to check if a value is in the list in selection from pandas data frame?
                            
                                Pass multiple parameters to concurrent.futures.Executor.map?
                            
                                Formatting floats in a numpy array [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With