find the set of column indices for non-zero values in each row in pandas' data frame

Question

Is there a good way to find the set of column indices for non-zero values in each row in pandas' data frame? Do I have to traverse the data frame row-by-row?

For example, the data frame is

c1  c2  c3  c4 c5 c6 c7 c8  c9
 1   1   0   0  0  0  0  0   0
 1   0   0   0  0  0  0  0   0
 0   1   0   0  0  0  0  0   0
 1   0   0   0  0  0  0  0   0
 0   1   0   0  0  0  0  0   0
 0   0   0   0  0  0  0  0   0
 0   2   1   1  1  1  1  0   2
 1   5   5   0  0  1  0  4   6
 4   3   0   1  1  1  1  5  10
 3   5   2   4  1  2  2  1   3
 6   4   0   1  0  0  0  0   0
 3   9   1   0  1  0  2  1   0

The output is expected to be

['c1','c2']
['c1']
['c2']
...

Younggun Kim · Accepted Answer

It seems you have to traverse the DataFrame by row.

cols = df.columns
bt = df.apply(lambda x: x > 0)
bt.apply(lambda x: list(cols[x.values]), axis=1)

and you will get:

0                                 [c1, c2]
1                                     [c1]
2                                     [c2]
3                                     [c1]
4                                     [c2]
5                                       []
6             [c2, c3, c4, c5, c6, c7, c9]
7                 [c1, c2, c3, c6, c8, c9]
8         [c1, c2, c4, c5, c6, c7, c8, c9]
9     [c1, c2, c3, c4, c5, c6, c7, c8, c9]
10                            [c1, c2, c4]
11                [c1, c2, c3, c5, c7, c8]
dtype: object

If performance is matter, try to pass raw=True to boolean DataFrame creation like below:

%timeit df.apply(lambda x: x > 0, raw=True).apply(lambda x: list(cols[x.values]), axis=1)
1000 loops, best of 3: 812 µs per loop

It brings you a better performance gain. Following is raw=False (which is default) result:

%timeit df.apply(lambda x: x > 0).apply(lambda x: list(cols[x.values]), axis=1)
100 loops, best of 3: 2.59 ms per loop

find the set of column indices for non-zero values in each row in pandas' data frame

Tags:

python

pandas

Qiang Li

Video Answer

1 Answers

Younggun Kim

Recent Activity

Donate For Us

find the set of column indices for non-zero values in each row in pandas' data frame

Tags:

python

pandas

Qiang Li

Video Answer

1 Answers

Younggun Kim

Related questions

Recent Activity

Donate For Us