Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does increasing precision make this program faster?

I was solving problem 26 on Project Euler where I need to calculate the length of the recurring part of 1/n where n is all integers between 1 and 1000, and see which number makes the longest recurring part. That meant that I needed my division done a lot more precisely. So I was playing around with my decimal precision by changing getContext().prec, but then somehow increasing the accuracy made the program a lot faster. I ran this program using Python 3.7. Here is the code:

import re
import time
s = time.time()
from decimal import *
getcontext().prec = 500 #This part
recurring = 0
answer = 0
p = re.compile(r"([0-9]+?)\1{3,}")
for i in range(1, 1000):
    f = p.search(str(Decimal(1) / Decimal(i))[5:])
    if f:
        number = f.group(1)
        if len(str(number)) > len(str(recurring)):
            recurring = number
            answer = i

print(answer)
print(time.time() - s)

This was the result when I used precision of 500:

>>> print(answer)
349
>>> print(time.time() - s)
2.923844575881958

...and this is what I got when I used precision of 5000:

>>> print(answer)
983
>>> print(time.time() - s)
0.07812714576721191

I swapped 500 with 5000, and not only did it give me the correct answer since the recurring part of 1/answer was probably longer than 500, but it also was much faster. I have tried this with an online Python interpreter, and it also gave me similar result. Why is it like this?

like image 952
0'' Avatar asked Jan 13 '19 13:01

0''


2 Answers

It is the combination of + and \1 in the regex

Methods

I used the following test code:

import time
import re
import string
t=time.time()
re.compile() # I tried differend regexes here
print(time.time()-t)
def test(n):
    t=time.time()
    match = rex.search(string.ascii_lowercase*n)
    print(match, time.time()-t)

After restarting the python session, the first call to re.compile takes longer than subsequent compilations of the same regex.

                                        compile(sec)   search (sec)    
REGEX                                   1st     2nd    short   long string
r"(abcdefghijklmnopqrstuvwxyz){6,}"     10^-4   10^-5  10^-5   10^-5
r"(abcdefghijklmnopqrstuvwxyz)\1\1\1\1" 10^-4   10^-5  10^-6   10^-6
r"([a-z]+?)\1\1\1\1"                    10^-4   10^-5  10^-4   10^-5 
r"([a-z]+)\1\1\1\1"                     10^-4   10^-5  10^-4   10^-5
r"([a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z])\1\1\1\1"
                                        10^-4  10^-5  10^-6  10^-6

Interestingly, occasionally r"([a-z]+?)\1\1\1" would be quick (10^-5 sec) for too short strings as well.

Discussion

There is some caching involved in compiling the rexex, but this was not the reason here.

It seems that the combination of the + operator (both greedy and non-greedy) inside the group and the \1 in the regex is at fault. For some reason, this combination is quicker if it actually matches than if it does not match.

To find out more, we probably have to understand the C source code of the sre module

like image 107
Bernhard Avatar answered Sep 17 '22 15:09

Bernhard


Something happens around prec == 4000. All answers are equal to 983, and time changes only slightly linearly from 4000 up. Maybe take a closer look there.

There is also a small dip around 2000. You need to measure separately time elapsed during Decimal divison and time elapsed during regex search to get more info.

On this image: prec (horizontal) vs. time in seconds (vertical) prec vs. time

like image 30
user2622016 Avatar answered Sep 18 '22 15:09

user2622016