Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using itertools.product and want to seed a value

So I've wrote a small script to download pictures from a website. It goes through a 7 alpha charactor value, where the first char is always a number. The problem is if I want to stop the script and start it up again I have to start all over.

Can I seed itertools.product somehow with the last value I got so I don't have to go through them all again.

Thanks for any input.

here is part of the code:

numbers = '0123456789'
alnum = numbers + 'abcdefghijklmnopqrstuvwxyz'

len7 = itertools.product(numbers, alnum, alnum, alnum, alnum, alnum, alnum) # length 7

for p in itertools.chain(len7):
    currentid = ''.join(p) 

    #semi static vars
    url = 'http://mysite.com/images/'
    url += currentid

    #Need to get the real url cause the redirect
    print "Trying " + url
    req = urllib2.Request(url)
    res = openaurl(req)
    if res == "continue": continue
    finalurl = res.geturl()

    #ok we have the full url now time to if it is real
    try: file = urllib2.urlopen(finalurl)
    except urllib2.HTTPError, e:
        print e.code

    im = cStringIO.StringIO(file.read())
    img = Image.open(im)
    writeimage(img)
like image 503
Ryan Avatar asked Mar 25 '12 22:03

Ryan


3 Answers

here's a solution based on pypy's library code (thanks to agf's suggestion in the comments).

the state is available via the .state attribute and can be reset via .goto(state) where state is an index into the sequence (starting at 0). there's a demo at the end (you need to scroll down, i'm afraid).

this is way faster than discarding values.

> cat prod.py 

class product(object):

    def __init__(self, *args, **kw):
        if len(kw) > 1:
            raise TypeError("product() takes at most 1 argument (%d given)" %
                             len(kw))
        self.repeat = kw.get('repeat', 1)
        self.gears = [x for x in args] * self.repeat
        self.num_gears = len(self.gears)
        self.reset()

    def reset(self):
        # initialization of indicies to loop over
        self.indicies = [(0, len(self.gears[x]))
                         for x in range(0, self.num_gears)]
        self.cont = True
        self.state = 0

    def goto(self, n):
        self.reset()
        self.state = n
        x = self.num_gears
        while n > 0 and x > 0:
            x -= 1
            n, m = divmod(n, len(self.gears[x]))
            self.indicies[x] = (m, self.indicies[x][1])
        if n > 0:
            self.reset()
            raise ValueError("state exceeded")

    def roll_gears(self):
        # Starting from the end of the gear indicies work to the front
        # incrementing the gear until the limit is reached. When the limit
        # is reached carry operation to the next gear
        self.state += 1
        should_carry = True
        for n in range(0, self.num_gears):
            nth_gear = self.num_gears - n - 1
            if should_carry:
                count, lim = self.indicies[nth_gear]
                count += 1
                if count == lim and nth_gear == 0:
                    self.cont = False
                if count == lim:
                    should_carry = True
                    count = 0
                else:
                    should_carry = False
                self.indicies[nth_gear] = (count, lim)
            else:
                break

    def __iter__(self):
        return self

    def next(self):
        if not self.cont:
            raise StopIteration
        l = []
        for x in range(0, self.num_gears):
            index, limit = self.indicies[x]
            l.append(self.gears[x][index])
        self.roll_gears()
        return tuple(l)

p = product('abc', '12')
print list(p)
p.reset()
print list(p)
p.goto(2)
print list(p)
p.goto(4)
print list(p)
> python prod.py 
[('a', '1'), ('a', '2'), ('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')]
[('a', '1'), ('a', '2'), ('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')]
[('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')]
[('c', '1'), ('c', '2')]

you should test it more - i may have made a dumb mistake - but the idea is quite simple, so you should be able to fix it :o) you're free to use my changes; no idea what the original pypy licence is.

also state isn't really the full state - it doesn't include the original arguments - it's just an index into the sequence. maybe it would have been better to call it index, but there are already indici[sic]es in the code...

update

here's a simpler version that is the same idea but works by transforming a sequence of numbers. so you just imap it over count(n) to get the sequence offset by n.

> cat prod2.py 

from itertools import count, imap

def make_product(*values):
    def fold((n, l), v):
        (n, m) = divmod(n, len(v))
        return (n, l + [v[m]])
    def product(n):
        (n, l) = reduce(fold, values, (n, []))
        if n > 0: raise StopIteration
        return tuple(l)
    return product

print list(imap(make_product(['a','b','c'], [1,2,3]), count()))
print list(imap(make_product(['a','b','c'], [1,2,3]), count(3)))

def product_from(n, *values):
    return imap(make_product(*values), count(n))

print list(product_from(4, ['a','b','c'], [1,2,3]))

> python prod2.py 
[('a', 1), ('b', 1), ('c', 1), ('a', 2), ('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)]
[('a', 2), ('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)]
[('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)]

(the downside here is that if you want to stop and restart you need to have kept track yourself of how many you have used)

like image 166
andrew cooke Avatar answered Sep 20 '22 19:09

andrew cooke


Once you get a fair way along the iterator, it's going to take a while to get to the spot using dropwhile.

You probably should adapt a recipe like this so that you can save the state with a pickle between runs.

Make sure that your script can only run once at a time, or you will need something more elaborate, such as a server process that hands out the ids to the scripts

like image 20
John La Rooy Avatar answered Sep 18 '22 19:09

John La Rooy


If your input sequences don't have any duplicate values, this may be faster than dropwhile to advance product as it doesn't require you to compare all of the dropped values by calculating the correct point to resume iteration.

from itertools import product, islice
from operator import mul

def resume_product(state, *sequences):
    start = 0
    seqlens = map(len, sequences)
    if any(len(set(seq)) != seqlen for seq, seqlen in zip(sequences, seqlens)):
        raise ValueError("One of your sequences contains duplicate values")
    current = end = reduce(mul, seqlens)
    for i, seq, seqlen in zip(state, sequences, seqlens):
        current /= seqlen
        start += seq.index(i) * current
    return islice(product(*sequences), start + 1, end)


seqs = '01', '23', '45', '678'        

# if I want to resume after '1247':
for i in resume_product('1247', *seqs):
    # blah blah
    pass
like image 45
agf Avatar answered Sep 18 '22 19:09

agf