I don't have a clue why is this happening. I was messing with some lists, and I needed a <code>for</code> loop going from 0 to <code>log(n, 2)</code> where n was the length of a list. But the code was amazingly slow, so after a bit a research I found that the problem is in the range generation. Sample code for demonstration: <pre class="prettyprint"><code>n = len([1,2,3,4,5,6,7,8]) k = 8 timeit('range(log(n, 2))', number=2, repeat=3) # Test 1 timeit('range(log(k, 2))', number=2, repeat=3) # Test 2 </code></pre> The output <pre class="prettyprint"><code>2 loops, best of 3: 2.2 s per loop 2 loops, best of 3: 3.46 µs per loop </code></pre> The number of tests is low (I didn't want this to be running more than 10 minutes), but it already shows that <code>range(log(n, 2))</code> is orders of magnitude slower than the counterpart using just the logarithm of an integer. This is really surprising and I don't have any clue on why is this happening. Maybe is a problem on my PC, maybe a Sage problem or a Python bug (I didn't try the same on Python). Using <code>xrange</code> instead of <code>range</code> doesn't help either. Also, if you get the number with <code>.n()</code>, test 1 runs at the same speed of 2. Does anybody know what can be happening? Thanks!

Python 2 allows range(some_float), but its deprecated and doesn't work in python 3. The code sample doesn't give the output specified. But we can walk through it. First, timeit needs a full script, the import in the script calling timeit is not used: <pre class="prettyprint"><code>>>> timeit('range(log(8,2))') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/timeit.py", line 226, in timeit return Timer(stmt, setup, timer).timeit(number) File "/usr/lib64/python2.6/timeit.py", line 192, in timeit timing = self.inner(it, self.timer) File "<timeit-src>", line 6, in inner NameError: global name 'log' is not defined </code></pre> If you add the import to the script being timed, it includes the setup time: <pre class="prettyprint"><code>>>> timeit('from math import log;range(log(8,2))') 3.7010221481323242 </code></pre> If you move the import to the setup, its better, but timing a one-shot is notoriously inaccurate: <pre class="prettyprint"><code>>>> timeit('range(log(8,2))',setup='from math import log') 1.9139349460601807 </code></pre> Finally, run it a bunch of times and you get a good number: <pre class="prettyprint"><code>>>> timeit('range(log(8,2))',setup='from math import log',number=100) 0.00038290023803710938 </code></pre>

Why is creating a range from 0 to log(len(list), 2) so slow?

Tags:

performance

python

sage

I don't have a clue why is this happening. I was messing with some lists, and I needed a for loop going from 0 to log(n, 2) where n was the length of a list. But the code was amazingly slow, so after a bit a research I found that the problem is in the range generation. Sample code for demonstration:

n = len([1,2,3,4,5,6,7,8])
k = 8
timeit('range(log(n, 2))', number=2, repeat=3) # Test 1
timeit('range(log(k, 2))', number=2, repeat=3) # Test 2

The output

2 loops, best of 3: 2.2 s per loop
2 loops, best of 3: 3.46 µs per loop

The number of tests is low (I didn't want this to be running more than 10 minutes), but it already shows that range(log(n, 2)) is orders of magnitude slower than the counterpart using just the logarithm of an integer. This is really surprising and I don't have any clue on why is this happening. Maybe is a problem on my PC, maybe a Sage problem or a Python bug (I didn't try the same on Python).

Using xrange instead of range doesn't help either. Also, if you get the number with .n(), test 1 runs at the same speed of 2.

Does anybody know what can be happening? Thanks!

489

asked Apr 29 '13 22:04

gjulianm

3 Answers

Good grief -- I recognize this one. It's related to one of mine, trac #12121. First, you get extra overhead from using a Python int as opposed to a Sage Integer for boring reasons:

sage: log(8, 2)
3
sage: type(log(8, 2))
sage.rings.integer.Integer
sage: log(8r, 2)
log(8)/log(2)
sage: type(log(8r, 2))
sage.symbolic.expression.Expression
sage: %timeit log(8, 2)
1000000 loops, best of 3: 1.4 us per loop
sage: %timeit log(8r, 2)
1000 loops, best of 3: 404 us per loop

(The r suffix means "raw", and prevents the Sage preparser from wrapping the literal 2 into Integer(2))

And then it gets weird. In order to produce an int for range to consume, Sage has to figure out how to turn log(8)/log(2) into 3, and it turns out that she does the worst thing possible. Plagiarizing my original diagnosis (mutatis mutandis):

First she checks to see if this object has its own way to get an int, and it doesn't. So she builds a RealInterval object out of log(8)/log(2), and it turns out that this is about the worst thing she could do! She checks to see whether the lower and upper parts of the interval agree [on the floor, I mean] (so that she knows for certain what the floor is). But in this case, because it really is an integer! this is always going to look like:

sage: y = log(8)/log(2)
sage: rif = RealIntervalField(53)(y)
sage: rif
3.000000000000000?
sage: rif.endpoints()
(2.99999999999999, 3.00000000000001)

These two bounds have floors which aren't aren't equal, so Sage decides she hasn't solved the problem yet, and she keeps increasing the precision to 20000 bits to see if she can prove that they are.. but by construction it's never going to work. Finally she gives up and tries to simplify it, which succeeds:

sage: y.simplify_full()
3

Proof without words that it's a perverse property of the exactly divisible case:

sage: %timeit range(log(8r, 2))
1 loops, best of 3: 2.18 s per loop
sage: %timeit range(log(9r, 2))
1000 loops, best of 3: 766 us per loop
sage: %timeit range(log(15r, 2))
1000 loops, best of 3: 764 us per loop
sage: %timeit range(log(16r, 2))
1 loops, best of 3: 2.19 s per loop

163

answered Oct 13 '22 23:10

DSM

This looks like it's a Sage bug.

I created a new notebook and did this:

n = len([1,2,3,4,5,6,7,8])
k = 8
timeit('range(log(n, 2))', number=2, repeat=3) # Test 1
timeit('range(log(len([1,2,3,4,5,6,7,8]), 2))', number=2, repeat=3) # Test 1.5
timeit('range(log(k, 2))', number=2, repeat=3) # Test 2

Test 1.5 is just as slow as test 1. But if you break it down in any way—take off the range, or even add m=n+0 and use m instead of n, it drops down to microseconds.

So clearly, Sage is trying to do something complicated here while evaluating the expression, and getting confused.

To verify this, in plain old ipython:

n = len([1,2,3,4,5,6,7,8])
k = 8
%timeit 'range(log(n, 2))'
%timeit 'range(log(len([1,2,3,4,5,6,7,8]), 2))'
%timeit 'range(log(k, 2))'

They're all equally fast, as you'd expect.

So… what do you do about it?

Well, you may want to try to track down the Sage bug and file it upstream. But meanwhile, you probably want a workaround in your code.

As noted above, just doing m = n+0 and using m instead of n seems to speed it up. See if that works for you?

answered Oct 14 '22 00:10

abarnert

Python 2 allows range(some_float), but its deprecated and doesn't work in python 3.

The code sample doesn't give the output specified. But we can walk through it. First, timeit needs a full script, the import in the script calling timeit is not used:

>>> timeit('range(log(8,2))')
  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/timeit.py", line 226, in timeit
    return Timer(stmt, setup, timer).timeit(number)
  File "/usr/lib64/python2.6/timeit.py", line 192, in timeit
    timing = self.inner(it, self.timer)
  File "<timeit-src>", line 6, in inner
NameError: global name 'log' is not defined

If you add the import to the script being timed, it includes the setup time:

>>> timeit('from math import log;range(log(8,2))')
3.7010221481323242

If you move the import to the setup, its better, but timing a one-shot is notoriously inaccurate:

>>> timeit('range(log(8,2))',setup='from math import log')
1.9139349460601807

Finally, run it a bunch of times and you get a good number:

>>> timeit('range(log(8,2))',setup='from math import log',number=100)
0.00038290023803710938

answered Oct 13 '22 23:10

tdelaney

Related questions
                            
                                Meaning of the backtick character in Python
                            
                                Why is BeautifulSoup unable to correctly read/parse this RSS (XML) document?
                            
                                Pandas slicing along multiindex and separate indices
                            
                                psycopg2 "IndexError: tuple index out of range" Error when using '%' like operator with arguments tuple
                            
                                Python in Windows Store apps
                            
                                Graph traversal with Networkx (Python)
                            
                                How to draw with Vertex Array Objects and glDrawElements in PyOpenGL
                            
                                How to set up and solve simultaneous equations in python
                            
                                Display an image located in the database in Django
                            
                                Django HTTP 500 Error
                            
                                Fill the right column of a matplotlib legend first
                            
                                Flask Database Issue
                            
                                Django query single underscore behaving like double underscore?
                            
                                How to remove escape sequence like '\xe2' or '\x0c' in python
                            
                                askopenfilename handling cancel on dialogue
                            
                                does calling a shell command from within a scripting language slow down performance?
                            
                                django serializers to json - custom json output format
                            
                                How do I compare 2D lists for equality in Python?
                            
                                How to show a window that was hidden using "withdraw" method?
                            
                                Using pandas to read text file with leading whitespace gives a NaN column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With