I am writing a python script to basically check every possible url and log it if it responds to a request.
I found a post on StackOverflow that suggested a method of generating the strings for the urls which works well.
for n in range(1, 4 + 1):
for comb in product(chars, repeat=n):
url = ("http://" + ''.join(comb) + ".com")
currentUrl = url
checkUrl(url)
As you can imagine there is way to many urls and it is going to take a very long time so I am trying to make a way to save my script and resume from were it left off.
My question is how can I have the loop start from a specific place, or does anyone have a working piece of code that does the same thing and will allow me to specify at starting point.
This is my script soo far..
import urllib.request
from string import digits, ascii_uppercase, ascii_lowercase
from itertools import product
goodUrls = "Valid_urls.txt"
saveFile = "save.txt"
currentUrl = ''
def checkUrl(url):
print("Trying - "+url)
try:
urllib.request.urlopen(url)
except Exception as e:
None
else:
log = open(goodUrls, 'a')
log.write(url + '\n')
chars = digits + ascii_lowercase
try:
while True:
for n in range(1, 4 + 1):
for comb in product(chars, repeat=n):
url = ("http://" + ''.join(comb) + ".com")
currentUrl = url
checkUrl(url)
except KeyboardInterrupt:
print("Saving and Exiting")
open(saveFile,'w').write(currentUrl)
The return value of itertools.product
is a generator object. As such all you'll have to do is:
products = product(...)
for foo in products:
if bar(foo):
spam(foo)
break
# other stuff
for foo in products:
# starts where you left off.
In your case the time taken to iterate through the possibilities is pretty small, at least compared to the time it'll take to make all those network requests. You could either save all the possibilities to disk and dump a list of what's left after every run of the program, or you could just save which number you're on. Since product
has deterministic output, that should do it.
try:
with open("progress.txt") as f:
first_up = int(f.read().strip())
except FileNotFoundError:
first_up = 0
try:
for i, foo in enumerate(products):
if i <= first_up:
continue # skip this iteration
# do stuff down here
except KeyboardInterrupt:
# this is really rude to do, by the by....
print("Saving and exiting"
with open("progress.txt", "w") as f:
f.write(str(i))
If there's some reason you need a human-readable "progress" file, you can save your last password as you did above and do:
for foo in itertools.dropwhile(products, lambda p != saved_password):
# do stuff
Although the attempt to find all the URLs by this method is ridiculous, the general question posed is a very good one. The short answer is that you cannot pickle an iterator in a straightforward way, because the pickle mechanism can't save the iterator's internal state. However, you can pickle an object that implements both __iter__
and __next__
. So if you create a class that has the desired functionality and also works as an iterator (by implementing those two functions), it can be pickled and reloaded. The reloaded object, when you make an iterator from it, will continue from where it left off.
#! python3.6
import pickle
class AllStrings:
CHARS = "abcdefghijklmnopqrstuvwxyz0123456789"
def __init__(self):
self.indices = [0]
def __iter__(self):
return self
def __next__(self):
s = ''.join([self.CHARS[n] for n in self.indices])
for m in range(len(self.indices)):
self.indices[m] += 1
if self.indices[m] < len(self.CHARS):
break
self.indices[m] = 0
else:
self.indices.append(0)
return s
try:
with open("bookmark.txt", "rb") as f:
all_strings = pickle.load(f)
except IOError:
all_strings = AllStrings()
try:
for s in iter(all_strings):
print(s)
except KeyboardInterrupt:
with open("bookmark.txt", "wb") as f:
pickle.dump(all_strings, f)
This solution also removes the limitation on the length of the string. The iterator will run forever, eventually generating all possible strings. Of course at some point the application will stop due to the increasing entropy of the universe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With