I'm working with hundreds of pandas dataframes. A typical dataframe is as follows: <pre class="prettyprint"><code>import pandas as pd import numpy as np data = 'filename.csv' df = pd.DataFrame(data) df one two three four five a 0.469112 -0.282863 -1.509059 bar True b 0.932424 1.224234 7.823421 bar False c -1.135632 1.212112 -0.173215 bar False d 0.232424 2.342112 0.982342 unbar True e 0.119209 -1.044236 -0.861849 bar True f -2.104569 -0.494929 1.071804 bar False .... </code></pre> There are certain operations whereby I'm dividing between columns values, e.g. <pre class="prettyprint"><code>df['one']/df['two'] </code></pre> However, there are times where I am dividing by zero, or perhaps both <pre class="prettyprint"><code>df['one'] = 0 df['two'] = 0 </code></pre> Naturally, this outputs the error: <pre class="prettyprint"><code>ZeroDivisionError: division by zero </code></pre> I would prefer for 0/0 to actually mean "there's nothing here", as this is often what such a zero means in a dataframe. (a) How would I code this to mean "divide by zero" is 0 ? (b) How would I code this to "pass" if divide by zero is encountered?

It would probably be more useful to use a dataframe that actually has zero in the denominator (see the last row of column <code>two</code>). <pre class="prettyprint"><code> one two three four five a 0.469112 -0.282863 -1.509059 bar True b 0.932424 1.224234 7.823421 bar False c -1.135632 1.212112 -0.173215 bar False d 0.232424 2.342112 0.982342 unbar True e 0.119209 -1.044236 -0.861849 bar True f -2.104569 0.000000 1.071804 bar False >>> df.one / df.two a -1.658442 b 0.761639 c -0.936904 d 0.099237 e -0.114159 f -inf # <<< Note division by zero dtype: float64 </code></pre> When one of the values is zero, you should get <code>inf</code> or <code>-inf</code> in the result. One way to convert these values is as follows: <pre class="prettyprint"><code>df['result'] = df.one.div(df.two) df.loc[~np.isfinite(df['result']), 'result'] = np.nan # Or = 0 per part a) of question. # or df.loc[np.isinf(df['result']), ... >>> df one two three four five result a 0.469112 -0.282863 -1.509059 bar True -1.658442 b 0.932424 1.224234 7.823421 bar False 0.761639 c -1.135632 1.212112 -0.173215 bar False -0.936904 d 0.232424 2.342112 0.982342 unbar True 0.099237 e 0.119209 -1.044236 -0.861849 bar True -0.114159 f -2.104569 0.000000 1.071804 bar False NaN </code></pre>

How to deal with "divide by zero" with pandas dataframes when manipulating columns? [duplicate]

Tags:

python

python-3.x

pandas

dataframe

I'm working with hundreds of pandas dataframes. A typical dataframe is as follows:

import pandas as pd
import numpy as np
data = 'filename.csv'
df = pd.DataFrame(data)
df 

        one       two     three  four   five
a  0.469112 -0.282863 -1.509059  bar   True
b  0.932424  1.224234  7.823421  bar  False
c -1.135632  1.212112 -0.173215  bar  False
d  0.232424  2.342112  0.982342  unbar True
e  0.119209 -1.044236 -0.861849  bar   True
f -2.104569 -0.494929  1.071804  bar  False
....

There are certain operations whereby I'm dividing between columns values, e.g.

df['one']/df['two']

However, there are times where I am dividing by zero, or perhaps both

df['one'] = 0
df['two'] = 0

Naturally, this outputs the error:

ZeroDivisionError: division by zero

I would prefer for 0/0 to actually mean "there's nothing here", as this is often what such a zero means in a dataframe.

(a) How would I code this to mean "divide by zero" is 0 ?

(b) How would I code this to "pass" if divide by zero is encountered?

341

asked Aug 11 '16 02:08

ShanZhengYang

1 Answers

It would probably be more useful to use a dataframe that actually has zero in the denominator (see the last row of column two).

        one       two     three   four   five
a  0.469112 -0.282863 -1.509059    bar   True
b  0.932424  1.224234  7.823421    bar  False
c -1.135632  1.212112 -0.173215    bar  False
d  0.232424  2.342112  0.982342  unbar   True
e  0.119209 -1.044236 -0.861849    bar   True
f -2.104569  0.000000  1.071804    bar  False

>>> df.one / df.two
a   -1.658442
b    0.761639
c   -0.936904
d    0.099237
e   -0.114159
f        -inf  # <<< Note division by zero
dtype: float64

When one of the values is zero, you should get inf or -inf in the result. One way to convert these values is as follows:

df['result'] = df.one.div(df.two)

df.loc[~np.isfinite(df['result']), 'result'] = np.nan  # Or = 0 per part a) of question.
# or df.loc[np.isinf(df['result']), ...

>>> df
        one       two     three   four   five    result
a  0.469112 -0.282863 -1.509059    bar   True -1.658442
b  0.932424  1.224234  7.823421    bar  False  0.761639
c -1.135632  1.212112 -0.173215    bar  False -0.936904
d  0.232424  2.342112  0.982342  unbar   True  0.099237
e  0.119209 -1.044236 -0.861849    bar   True -0.114159
f -2.104569  0.000000  1.071804    bar  False       NaN

107

answered Oct 16 '22 01:10

Alexander

Related questions
                            
                                Call Hierarchy in Visual Studio Code
                            
                                Python not able to connect to grpc channel -> "failed to connect to all addresses" "grpc_status":14
                            
                                PyQt: How to update progress without freezing the GUI?
                            
                                Is python's shutil.move() atomic on linux?
                            
                                Can I get a view of a numpy array at specified indexes? (a view from "fancy indexing")
                            
                                Convert HTML string to an image in Python [closed]
                            
                                Library or tool to download multiple files in parallel [closed]
                            
                                How do I apply a DCT to an image in Python?
                            
                                Django's Double Underscore
                            
                                SqlAlchemy won't accept datetime.datetime.now value in a DateTime column
                            
                                How to train a neural network to supervised data set using pybrain black-box optimization?
                            
                                Scipy: lognormal fitting
                            
                                Writing wav file in Python with wavfile.write from SciPy
                            
                                Python windows path slash [duplicate]
                            
                                What is the point of .ix indexing for pandas Series
                            
                                Utility of parameter 'out' in numpy functions
                            
                                Mock side effect only X number of times
                            
                                What to do if I want 3D spline/smooth interpolation of random unstructured data?
                            
                                ImportError: No module named extern
                            
                                What is the best way to remove accents with Apache Spark dataframes in PySpark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With