Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Performing Modulo on Strings

I have several billion strings in the format word0.word1.word2, and I wish to perform modulo n on those strings so that I can feed each to a database writer for storage. I know I can perform a form a modulo 10 on the first character of the strings like this:

for i in ["a.b","c.d"]: 
    print ord(i[0]) % 10

This won't divide my strings evenly, though, as word0, word1, and word2 are sorted into alphabetical order, and the first character of the string is very often "a". I could take the last letter of the string, but am not sure if those are normally distributed or not.

My question: Is there a fast way to perform something like "ord" on the entire string? I ultimately plan to run modulo 48 on the integer representations of the strings, and wish for that modular output to be uniformly distributed across all 48 cores. I would be grateful for any help others can offer.

like image 415
duhaime Avatar asked Oct 19 '22 06:10

duhaime


1 Answers

s = "whatever"  # have a string
h = hash(s)     # obtain its hash
bin = h % 48    # find the bin

Update: The Python's built-in hash function provides deterministic values only for a single process. If you want to keep this information (directly or indirectly ) in a database you have to use an explicit hash function that doesn't include any random data. (Credit goes to @Alik)

like image 154
dlask Avatar answered Nov 03 '22 07:11

dlask