I understand that this could be argued as a non-issue, but I write software for HPC environments, so this 3.5x speed increase actually makes a difference. <pre class="prettyprint"><code>In [1]: %timeit 10 / float(98765) 1000000 loops, best of 3: 313 ns per loop In [2]: %timeit 10 / (98765 * 1.0) 10000000 loops, best of 3: 80.6 ns per loop </code></pre> I used <code>dis</code> to have a look at the code, and I assume <code>float()</code> will be slower as it requires a function call (unfortunately I couldn't <code>dis.dis(float)</code> to see what it's actually doing). I guess a second question would be when should I use <code>float(n)</code> and when should I use <code>n * 1.0</code>?

Because Peep hole optimizer optimizes it by precalculating the result of that multiplication <pre class="prettyprint"><code>import dis dis.dis(compile("10 / float(98765)", "<string>", "eval")) 1 0 LOAD_CONST 0 (10) 3 LOAD_NAME 0 (float) 6 LOAD_CONST 1 (98765) 9 CALL_FUNCTION 1 12 BINARY_DIVIDE 13 RETURN_VALUE dis.dis(compile("10 / (98765 * 1.0)", "<string>", "eval")) 1 0 LOAD_CONST 0 (10) 3 LOAD_CONST 3 (98765.0) 6 BINARY_DIVIDE 7 RETURN_VALUE </code></pre> It stores the result of <code>98765 * 1.0</code> in the byte code as a constant value. So, it just has to load it and divide, where as in the first case we have to call the function. We can see that even more clearly like this <pre class="prettyprint"><code>print compile("10 / (98765 * 1.0)", "<string>", "eval").co_consts # (10, 98765, 1.0, 98765.0) </code></pre> Since the value is pre-calculated during the compile time itself, second one is faster. Edit: As pointed out by Davidmh in the comments, <blockquote> And the reason why it is not also optimising away the division is because its behaviour depends on flags, like <code>from __future__ import division</code> and also because of <code>-Q</code> flag. </blockquote> Quoting the comment from the actual peephole optimizer code for Python 2.7.9, <pre class="prettyprint lang-c prettyprint-override"><code> /* Cannot fold this operation statically since the result can depend on the run-time presence of the -Qnew flag */ </code></pre>

Why `float` function is slower than multiplying by 1.0?

Tags:

I understand that this could be argued as a non-issue, but I write software for HPC environments, so this 3.5x speed increase actually makes a difference.

In [1]: %timeit 10 / float(98765)             1000000 loops, best of 3: 313 ns per loop  In [2]: %timeit 10 / (98765 * 1.0) 10000000 loops, best of 3: 80.6 ns per loop

I used dis to have a look at the code, and I assume float() will be slower as it requires a function call (unfortunately I couldn't dis.dis(float) to see what it's actually doing).

I guess a second question would be when should I use float(n) and when should I use n * 1.0?

635

asked Apr 10 '14 09:04

Jason P

1 Answers

Because Peep hole optimizer optimizes it by precalculating the result of that multiplication

import dis dis.dis(compile("10 / float(98765)", "<string>", "eval"))    1           0 LOAD_CONST               0 (10)               3 LOAD_NAME                0 (float)               6 LOAD_CONST               1 (98765)               9 CALL_FUNCTION            1              12 BINARY_DIVIDE                     13 RETURN_VALUE          dis.dis(compile("10 / (98765 * 1.0)", "<string>", "eval"))    1           0 LOAD_CONST               0 (10)               3 LOAD_CONST               3 (98765.0)               6 BINARY_DIVIDE                      7 RETURN_VALUE

It stores the result of 98765 * 1.0 in the byte code as a constant value. So, it just has to load it and divide, where as in the first case we have to call the function.

We can see that even more clearly like this

print compile("10 / (98765 * 1.0)", "<string>", "eval").co_consts # (10, 98765, 1.0, 98765.0)

Since the value is pre-calculated during the compile time itself, second one is faster.

Edit: As pointed out by Davidmh in the comments,

And the reason why it is not also optimising away the division is because its behaviour depends on flags, like from __future__ import division and also because of -Q flag.

Quoting the comment from the actual peephole optimizer code for Python 2.7.9,

        /* Cannot fold this operation statically since            the result can depend on the run-time presence            of the -Qnew flag */

108

answered Oct 06 '22 00:10

thefourtheye

Related questions
                            
                                Add a new folder to asset path in rails 4
                            
                                Generate pdf file using pdfkit and send it to browser in nodejs-expressjs
                            
                                How to change default branch when using arc diff?
                            
                                Task status changes to RanToCompletion if the Task await's something
                            
                                Is it secure to use window.location.href directly without validation
                            
                                Swift optional chaining doesn't work in closure
                            
                                Android Empty Activity and Blank Activity
                            
                                How can I extract only the used CSS on a given web page and have that combined into a separate style sheet?
                            
                                Where is the SQL Query Analyzer in SQL Server Management Studio 2012
                            
                                TypeError: 'in <string>' requires string as left operand, not int
                            
                                Approximate String Matching using LSH
                            
                                Take User Back to Where They Scrolled to on previous page when clicking Browser Back Button

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With