I have to label something in a "strong monotone increasing" fashion. Be it Invoice Numbers, shipping label numbers or the like.
Fancy way of saying: I need to count 1,2,3,4 ... The number Space I have available are typically 100.000 numbers and I need perhaps 1000 a day.
I know this is a hard Problem in distributed systems and often we are much better of with GUIDs. But in this case for legal reasons I need "traditional numbering".
Can this be implemented on Google AppEngine (preferably in Python)?
If you absolutely have to have sequentially increasing numbers with no gaps, you'll need to use a single entity, which you update in a transaction to 'consume' each new number. You'll be limited, in practice, to about 1-5 numbers generated per second - which sounds like it'll be fine for your requirements.
If you drop the requirement that IDs must be strictly sequential, you can use a hierarchical allocation scheme. The basic idea/limitation is that transactions must not affect multiple storage groups.
For example, assuming you have the notion of "users", you can allocate a storage group for each user (creating some global object per user). Each user has a list of reserved IDs. When allocating an ID for a user, pick a reserved one (in a transaction). If no IDs are left, make a new transaction allocating 100 IDs (say) from the global pool, then make a new transaction to add them to the user and simultaneously withdraw one. Assuming each user interacts with the application only sequentially, there will be no concurrency on the user objects.
The gaetk - Google AppEngine Toolkit now comes with a simple library function to get a number in a sequence. It is based on Nick Johnson's transactional approach and can be used quite easily as a foundation for Martin von Löwis' sharding approach:
>>> from gaeth.sequences import *
>>> init_sequence('invoce_number', start=1, end=0xffffffff)
>>> get_numbers('invoce_number', 2)
[1, 2]
The functionality is basically implemented like this:
def _get_numbers_helper(keys, needed):
results = []
for key in keys:
seq = db.get(key)
start = seq.current or seq.start
end = seq.end
avail = end - start
consumed = needed
if avail <= needed:
seq.active = False
consumed = avail
seq.current = start + consumed
seq.put()
results += range(start, start + consumed)
needed -= consumed
if needed == 0:
return results
raise RuntimeError('Not enough sequence space to allocate %d numbers.' % needed)
def get_numbers(needed):
query = gaetkSequence.all(keys_only=True).filter('active = ', True)
return db.run_in_transaction(_get_numbers_helper, query.fetch(5), needed)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With