I was solving problem 26 on Project Euler where I need to calculate the length of the recurring part of 1/n where n is all integers between 1 and 1000, and see which number makes the longest recurring part. That meant that I needed my division done a lot more precisely. So I was playing around with my decimal precision by changing getContext().prec
, but then somehow increasing the accuracy made the program a lot faster. I ran this program using Python 3.7. Here is the code:
import re
import time
s = time.time()
from decimal import *
getcontext().prec = 500 #This part
recurring = 0
answer = 0
p = re.compile(r"([0-9]+?)\1{3,}")
for i in range(1, 1000):
f = p.search(str(Decimal(1) / Decimal(i))[5:])
if f:
number = f.group(1)
if len(str(number)) > len(str(recurring)):
recurring = number
answer = i
print(answer)
print(time.time() - s)
This was the result when I used precision of 500:
>>> print(answer)
349
>>> print(time.time() - s)
2.923844575881958
...and this is what I got when I used precision of 5000:
>>> print(answer)
983
>>> print(time.time() - s)
0.07812714576721191
I swapped 500 with 5000, and not only did it give me the correct answer since the recurring part of 1/answer was probably longer than 500, but it also was much faster. I have tried this with an online Python interpreter, and it also gave me similar result. Why is it like this?
I used the following test code:
import time
import re
import string
t=time.time()
re.compile() # I tried differend regexes here
print(time.time()-t)
def test(n):
t=time.time()
match = rex.search(string.ascii_lowercase*n)
print(match, time.time()-t)
After restarting the python session, the first call to re.compile
takes longer than subsequent compilations of the same regex.
compile(sec) search (sec)
REGEX 1st 2nd short long string
r"(abcdefghijklmnopqrstuvwxyz){6,}" 10^-4 10^-5 10^-5 10^-5
r"(abcdefghijklmnopqrstuvwxyz)\1\1\1\1" 10^-4 10^-5 10^-6 10^-6
r"([a-z]+?)\1\1\1\1" 10^-4 10^-5 10^-4 10^-5
r"([a-z]+)\1\1\1\1" 10^-4 10^-5 10^-4 10^-5
r"([a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z])\1\1\1\1"
10^-4 10^-5 10^-6 10^-6
Interestingly, occasionally r"([a-z]+?)\1\1\1"
would be quick (10^-5 sec) for too short strings as well.
There is some caching involved in compiling the rexex, but this was not the reason here.
It seems that the combination of the +
operator (both greedy and non-greedy) inside the group and the \1
in the regex is at fault. For some reason, this combination is quicker if it actually matches than if it does not match.
To find out more, we probably have to understand the C source code of the sre
module
Something happens around prec == 4000. All answers are equal to 983, and time changes only slightly linearly from 4000 up. Maybe take a closer look there.
There is also a small dip around 2000. You need to measure separately time elapsed during Decimal divison and time elapsed during regex search to get more info.
On this image: prec (horizontal) vs. time in seconds (vertical)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With