Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas conditional count across columns

I have a dataframe (called panel[xyz]) containing only 1, 0 and -1. The dimensions are: rows 0:10 and columns a:j.

I would like to create another dataframe (df) which has the same vertical axis, but only 3 columns: col_1 = count all non-zero values (1s and -1s) col_2 = count all 1s col_3 = count all -1s

I found this in searching SO:

df[col_1] = (pan[xyz]['a','b','c','d','e'] > 0).count(axis=1)

...and have tried many different iterations, but I cannot get the conditional (>0) to distinguish between the different values in pan[xyz]. The count is always = 5.

Any help would be much appreciated.

Edit:

pan[xyz] =

.	'a'	'b'	'c'	'd'	'e'	'f'	'g'	'h'	'i'	'j'
0	1	0	0	-1	0	0	-1	0	1	0
1	0	1	0	0	0	1	0	0	0	-1
2	1	0	0	0	0	-1	0	0	0	0
3	0	-1	0	0	0	0	0	1	0	0
4	0	0	0	1	0	0	-1	0	0	-1

df should be =

.	col_1	col_2	col_3
0	4	2	2
1	3	2	1
2	2	1	1
3	2	1	1
4	3	1	2

But this is what i get for col_1 :

df = (panel[xyz] > 0).count(axis=1)

df
Out[129]: 
0    10
1    10
2    10
3    10
4    10
dtype: int6
like image 686
MJS Avatar asked Apr 10 '15 16:04

MJS


People also ask

How do I count multiple columns in pandas?

Pandas value_counts() on multiple columns (or on a dataframe) Sometimes you might want to tabulate counts of multiple variables. With Pandas version 1.1. 0 and above we can use Pandas' value_coiunts() function to get counts for multiple variable.

How do I count specific columns in pandas?

Dataset in use: We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

What is difference between count () and Value_counts?

count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.


1 Answers

I'm just doing this with a flat dataframe but it's the same for panel. You can do one of two ways. The first way is what you did, just change the count() to sum():

( df > 0 ).sum(axis=1)

The underlying structure is boolean and True and False both get counted, whereas if you sum them it is interpreted more like you were expecting (0/1).

But a more standard way to do it would be like this:

df[ df > 0 ].count(axis=1)

While the former method was based on a dataframe of booleans, the latter looks like this:

df[ df > 0 ]

    a   b   c   d   e   f   g   h   i   j
0   1 NaN NaN NaN NaN NaN NaN NaN   1 NaN
1 NaN   1 NaN NaN NaN   1 NaN NaN NaN NaN
2   1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN   1 NaN NaN
4 NaN NaN NaN   1 NaN NaN NaN NaN NaN NaN

In this case it doesn't really matter which method you use, but in general the latter is going to be better, because you can do more with it. For example, with the former method (which has binary outcomes by design), all you can really do is count, but in the latter method you can count, sum, multiply, etc.

The potential usefulness of this may be more obvious for the case of df != 0, where there are more than two possible values:

df[ df != 0 ]

    a   b   c   d   e   f   g   h   i   j
0   1 NaN NaN  -1 NaN NaN  -1 NaN   1 NaN
1 NaN   1 NaN NaN NaN   1 NaN NaN NaN  -1
2   1 NaN NaN NaN NaN  -1 NaN NaN NaN NaN
3 NaN  -1 NaN NaN NaN NaN NaN   1 NaN NaN
4 NaN NaN NaN   1 NaN NaN  -1 NaN NaN  -1
like image 89
JohnE Avatar answered Oct 13 '22 08:10

JohnE