handling zeros in pandas DataFrames column divisions in Python

Question

What's the best way to handle zero denominators when dividing pandas DataFrame columns by each other in Python? for example:

df = pandas.DataFrame({"a": [1, 2, 0, 1, 5], "b": [0, 10, 20, 30, 50]})
df.a / df.b  # yields error

I'd like the ratios where the denominator is zero to be registered as NA (numpy.nan). How can this be done efficiently in pandas?

Casting to float64 does not work at level of columns:

In [29]: df
Out[29]: 
   a   b
0  1   0
1  2  10
2  0  20
3  1  30
4  5  50

In [30]: df["a"].astype("float64") / df["b"].astype("float64")
...

FloatingPointError: divide by zero encountered in divide

How can I do it just for particular columns and not entire df?

Jeff · Accepted Answer

You need to work in floats, otherwise you will have integer division, prob not what you want

In [12]: df = pandas.DataFrame({"a": [1, 2, 0, 1, 5], 
                                "b": [0, 10, 20, 30, 50]}).astype('float64')

In [13]: df
Out[13]: 
   a   b
0  1   0
1  2  10
2  0  20
3  1  30
4  5  50

In [14]: df.dtypes
Out[14]: 
a    float64
b    float64
dtype: object

Here's one way

In [15]: x = df.a/df.b

In [16]: x
Out[16]: 
0         inf
1    0.200000
2    0.000000
3    0.033333
4    0.100000
dtype: float64

In [17]: x[np.isinf(x)] = np.nan

In [18]: x
Out[18]: 
0         NaN
1    0.200000
2    0.000000
3    0.033333
4    0.100000
dtype: float64

Here's another way

In [20]: df.a/df.b.replace({ 0 : np.nan })
Out[20]: 
0         NaN
1    0.200000
2    0.000000
3    0.033333
4    0.100000
dtype: float64

Kyr · Answer

Just for completeness, I would like to add the following way of division that uses DataFrame.apply like:

df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)

In full:

In [1]:
df = pd.DataFrame({"a": [1, 2, 0, 1, 5, 0], "b": [0, 10, 20, 30, 50, 0]}).astype('float64')

def div(numerator, denominator):
  return lambda row: 0.0 if row[denominator] == 0 else float(row[numerator]/row[denominator])

df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)

Out[1]:
      a     b         c
0   1.0   0.0  0.000000
1   2.0  10.0  0.200000
2   0.0  20.0  0.000000
3   1.0  30.0  0.033333
4   5.0  50.0  0.100000
5   0.0   0.0  0.000000

This solution is slower than the one proposed by Jeff:

df.loc[:, 'c'] = df.apply(div('a', 'b'), axis=1)
# 1.27 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

df.loc[:, 'c'] = df.a/df.b.replace({ 0 : np.inf })
# 651 µs ± 44.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

handling zeros in pandas DataFrames column divisions in Python

Tags:

python

pandas

dataframe

numpy

2 Answers

Jeff

Kyr

Recent Activity

Donate For Us

handling zeros in pandas DataFrames column divisions in Python

Tags:

python

pandas

dataframe

numpy

2 Answers

Jeff

Kyr

Related questions

Recent Activity

Donate For Us