Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode a text string into a number in Python?

Let's say you have a string:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"

I am looking for a way to convert that string into a number, like say:

encoded_string = number_encode(mystring)

print(encoded_string)

08713091353153848093820430298

..that you can convert back to the original string.

decoded_string = number_decode(encoded_string)

print(decoded_string)

"Welcome to the InterStar cafe, serving you since 2412!"

It doesn't have to be cryptographically secure, but it does have to put out the same number for the same string regardless of what computer it's running on.

like image 936
lespaul Avatar asked Jan 27 '23 13:01

lespaul


2 Answers

encode it to a bytes in a fixed encoding, then convert the bytes to an int with int.from_bytes. The reverse operation is to call .to_bytes on the resulting int, then decode back to str:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"
mybytes = mystring.encode('utf-8')
myint = int.from_bytes(mybytes, 'little')
print(myint)
recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little')
recoveredstring = recoveredbytes.decode('utf-8')
print(recoveredstring)

Try it online!

This has one flaw, which is that if the string ends in NUL characters ('\0'/\x00') you'll lose them (switching to 'big' byte order would lose them from the front). If that's a problem, you can always just pad with a '\x01' explicitly and remove it on the decode side so there are no trailing 0s to lose:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"
mybytes = mystring.encode('utf-8') + b'\x01'  # Pad with 1 to preserve trailing zeroes
myint = int.from_bytes(mybytes, 'little')
print(myint)
recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little')
recoveredstring = recoveredbytes[:-1].decode('utf-8') # Strip pad before decoding
print(recoveredstring)
like image 172
ShadowRanger Avatar answered Jan 29 '23 01:01

ShadowRanger


If you are simply looking for making a certain string unreadable by a human you might use base64, base64.b64encode(s, altchars=None) and base64.b64decode(s, altchars=None, validate=False):

Take into account that it requires bytes-like object, so you should start your strings with b"I am a bytes-like string":

>>> import base64
>>> coded = base64.b64encode(b"Welcome to the InterStar cafe, serving you since 2412!")
>>> print(coded)
b'V2VsY29tZSB0byB0aGUgSW50ZXJTdGFyIGNhZmUsIHNlcnZpbmcgeW91IHNpbmNlIDI0MTIh'
>>> print(base64.b64decode(coded))
b"Welcome to the InterStar cafe, serving you since 2412!"

If you already have your strings, you can convert them with str.encode('utf-8'):

>>> myString = "Welcome to the InterStar cafe, serving you since 2412!"
>>> bString = myString.encode('utf-8')
>>> print(bString)
b'Welcome to the InterStar cafe, serving you since 2412!'
>>> print(bString.decode())
'Welcome to the InterStar cafe, serving you since 2412!'

If you really need to convert the string to only numbers, you would have to use @ShadowRanger's answer.

like image 45
Ender Look Avatar answered Jan 29 '23 01:01

Ender Look