Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting Byte to String and Back Properly in Python3?

Given a random byte (i.e. not only numbers/characters!), I need to convert it to a string and then back to the inital byte without loosing information. This seems like a basic task, but I ran in to the following problems:

Assuming:

rnd_bytes = b'w\x12\x96\xb8'
len(rnd_bytes)

prints: 4

Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.

my_str = rnd_bytes.decode('utf-8' , 'backslashreplace')

Now, I have the string. I want to convert it back to exactly the original byte (size 4!):

According to python ressources and this answer, there are different possibilities:

conv_bytes = bytes(my_str, 'utf-8')
conv_bytes = my_str.encode('utf-8')

But len(conv_bytes) returns 10.

I tried to analyse the outcome:

>>> repr(rnd_bytes)
"b'w\\x12\\x96\\xb8'"
>>> repr(my_str)
"'w\\x12\\\\x96\\\\xb8'"
>>> repr(conv_bytes)
"b'w\\x12\\\\x96\\\\xb8'"

It would make sense to replace '\\\\'. my_str.replace('\\\\','\\') doesn't change anything. Probably, because four backslashes represent only two. So, my_str.replace('\\','\') would find the '\\\\', but leads to

SyntaxError: EOL while scanning string literal

due to the last argument '\'. This had been discussed here, where the following suggestion came up:

>>> my_str2=my_str.encode('utf_8').decode('unicode_escape')
>>> repr(my_str2)
"'w\\x12\\x96¸'"

This replaces the '\\\\' but seems to add / change some other characters:

>>> conv_bytes2 = my_str2.encode('utf8')
>>> len(conv_bytes2)
6
>>> repr(conv_bytes2)
"b'w\\x12\\xc2\\x96\\xc2\\xb8'"

There must be a prober way to convert a (complex) byte to a string and back. How can I achieve that?

like image 911
black Avatar asked Feb 19 '26 14:02

black


1 Answers

You could try to convert it to hex format. Then it is easy to convert it back to byte format.

Sample code to convert bytes to string:

hex_str = rnd_bytes.hex()

Here is how 'hex_str' looks like:

'771296b8'

And code for converting it back to bytes:

new_rnd_bytes = bytes.fromhex(hex_str)

The result is:

b'w\x12\x96\xb8'

For processing you can use:

readable_str = ''.join(chr(int(hex_str[i:i+2], 16)) for i in range(0, len(hex_str), 2))

But never try to encode readable string, here is how readable string looks like:

'w\x12\x96¸'

After processing readable string convert it back to hex format before converting it back to bytes string like:

hex_str = ''.join([str(hex(ord(i)))[2:4] for i in readable_str])
like image 182
Ozgur Bagci Avatar answered Feb 22 '26 04:02

Ozgur Bagci