python pandas conditional count across columns

Tags:

I have a dataframe (called panel[xyz]) containing only 1, 0 and -1. The dimensions are: rows 0:10 and columns a:j.

I would like to create another dataframe (df) which has the same vertical axis, but only 3 columns: col_1 = count all non-zero values (1s and -1s) col_2 = count all 1s col_3 = count all -1s

I found this in searching SO:

df[col_1] = (pan[xyz]['a','b','c','d','e'] > 0).count(axis=1)

...and have tried many different iterations, but I cannot get the conditional (>0) to distinguish between the different values in pan[xyz]. The count is always = 5.

Any help would be much appreciated.

Edit:

pan[xyz] =

.	'a'	'b'	'c'	'd'	'e'	'f'	'g'	'h'	'i'	'j'
0	1	0	0	-1	0	0	-1	0	1	0
1	0	1	0	0	0	1	0	0	0	-1
2	1	0	0	0	0	-1	0	0	0	0
3	0	-1	0	0	0	0	0	1	0	0
4	0	0	0	1	0	0	-1	0	0	-1

df should be =

.	col_1	col_2	col_3
0	4	2	2
1	3	2	1
2	2	1	1
3	2	1	1
4	3	1	2

But this is what i get for col_1 :

df = (panel[xyz] > 0).count(axis=1)

df
Out[129]: 
0    10
1    10
2    10
3    10
4    10
dtype: int6

686

asked Apr 10 '15 16:04

MJS

1 Answers

I'm just doing this with a flat dataframe but it's the same for panel. You can do one of two ways. The first way is what you did, just change the count() to sum():

( df > 0 ).sum(axis=1)

The underlying structure is boolean and True and False both get counted, whereas if you sum them it is interpreted more like you were expecting (0/1).

But a more standard way to do it would be like this:

df[ df > 0 ].count(axis=1)

While the former method was based on a dataframe of booleans, the latter looks like this:

df[ df > 0 ]

    a   b   c   d   e   f   g   h   i   j
0   1 NaN NaN NaN NaN NaN NaN NaN   1 NaN
1 NaN   1 NaN NaN NaN   1 NaN NaN NaN NaN
2   1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN   1 NaN NaN
4 NaN NaN NaN   1 NaN NaN NaN NaN NaN NaN

In this case it doesn't really matter which method you use, but in general the latter is going to be better, because you can do more with it. For example, with the former method (which has binary outcomes by design), all you can really do is count, but in the latter method you can count, sum, multiply, etc.

The potential usefulness of this may be more obvious for the case of df != 0, where there are more than two possible values:

df[ df != 0 ]

    a   b   c   d   e   f   g   h   i   j
0   1 NaN NaN  -1 NaN NaN  -1 NaN   1 NaN
1 NaN   1 NaN NaN NaN   1 NaN NaN NaN  -1
2   1 NaN NaN NaN NaN  -1 NaN NaN NaN NaN
3 NaN  -1 NaN NaN NaN NaN NaN   1 NaN NaN
4 NaN NaN NaN   1 NaN NaN  -1 NaN NaN  -1

answered Oct 13 '22 08:10

JohnE

Related questions
                            
                                Pandas: Aggregate by month for every subgroup
                            
                                Sorting dictionary keys by value, then those with the same value alphabetically
                            
                                pandas plot doesn't show in ipython notebook as inline
                            
                                Vectorized spherical bessel functions in python?
                            
                                ImportError when using console_scripts in setuptools
                            
                                Python RESTful client like Guzzle from PHP
                            
                                Boto-like library for Google Cloud Storage
                            
                                axes.fmt_xdata in matplotlib not being called
                            
                                matplotlib: Set width or height of figure without changing aspect ratio
                            
                                Prevent matplotlib from interpreting underscore as subscript in plot title
                            
                                Pycharm 3.4.1 - "AppRegistryNotReady: Models aren't loaded yet". Django Rest framewrok
                            
                                US Census API - Get The Population of Every City in a State Using Python
                            
                                Python list of Objects taking up too much memory
                            
                                How to upload complete folder to Dropbox using python
                            
                                Calculating just a specific property in regionprops python
                            
                                HOW-TO: LDAP bind+authenticate using python-ldap
                            
                                Django 1.8 - KeyError 'request'
                            
                                Changing the __name__ of a generator
                            
                                python highlighting in Rmarkdown in RStudio
                            
                                Using multiple labels with Neomodel

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python pandas conditional count across columns

Tags:

python

pandas

dataframe

vectorization

conditional

MJS

People also ask

1 Answers

JohnE

Recent Activity

Donate For Us