Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I generate a random url of a certain length every time a page is created?

In my python/pyramid app, I let users generate html pages which are stored in an amazon s3 bucket. I want each page to have a separate path like www.domain.com/2cxj4kl. I have figured out how to generate the random string to put in the url, but I am more concerned with duplicates. how can I check each of these strings against a list of existing strings so that nothing is overwritten? Can I just put each string in a dictionary or array and check the ever growing array/dict each time a new one is created? Are there issues with continuing to grow such an object, and will it permanently exist in the app memory somehow? How can I do this?

like image 488
BigBoy1337 Avatar asked Feb 18 '23 07:02

BigBoy1337


2 Answers

The approach of storing a list of existing identifiers in some storage and comparing new identifiers with the list would work in a simple case, however, this may become tricky if you have to store, say, billions of identifiers, or if you want to generate them on more than one machine. This also complicates things with storing the list, retrieving, comparing etc. Not to mention locking - what if two users decide to create a page at exactly the same second?

Universally Unique Identifiers (UUIDs) have a very-very low chance of collision - much lower than, say, a chance of our planet being swallowed by a black hole in the next five minutes. So low that you can ignore it for any practical purposes.

Python has a library called uuid to generate UUIDs

>>> import uuid
>>> # make a random UUID
>>> u = uuid.uuid4()
>>> u.hex
'f3db6f9a34ed48938a45113ac4b5f156'

The resulting string is 32 characters long, which may be too long for you.

Alternatively, you may just generate a random string like this:

''.join(random.choice(string.ascii_letters + string.digits) for x in range(12))

at 10-15 characters long it probably will be less random than an UUID, but still a chance of a collision would be much lower than, say, a chance of a janitor at Amazon data center going mental, destroying your server with an axe and setting the data center on fire :)

like image 110
Sergey Avatar answered Feb 19 '23 19:02

Sergey


I am new to Python and programming but here a few issues I can see with 'random string' idea:

You will most probably end up generating same string over and over if you are using shorter strings. On the other side if you are using longer strings, the changes of getting the same string is less. However you will want to watch out for duplicates on either case. Therefore my suggestion is to make some estimations of how many urls you will need, and use an optimal string length for it.

Easiest way is to keep these urls in a list, and use a simple if check before registering new ones:

if new_url in url_list:
    generate_new_url()
else:
    url_list.append(new_url)

However it also sounds like you will want to employ a database to permanently store your urls. In most sql based databases you can setup your url column to be 'unique'; therefore database stops you from having dublicate urls.

I am not sure but with the database you can probably do this:

try:
    #insert value to database
except:
    generate_new_url()
like image 29
Dogukan Tufekci Avatar answered Feb 19 '23 20:02

Dogukan Tufekci