I have this sequence and I have to decode it, as a complete beginner in Python and in encoding.
enc = b'\x80\x03}q\x00(K\x01K\x01K\x02K\x03K\x03K\x06K\x04G?\xc5UUUUUUK\x05G?\xe0\x00\x00\x00\x00\x00\x00K\x06G?\x9cq\xc7\x1cq\xc7\x1cK\x07G?\xc5UUUUUUK\x08K$K\tG?\xb5UUUUUUK\nK\x07K\x0bG?\xe5UUUUUUK\x0cG?\xb5UUUUUUK\rG?\xedUUUUUUK\x0eK4K\x0fG?\xb3\xb1;\x13\xb1;\x14K\x10K\x00K\x11G?\xcd\x89\xd8\x9d\x89\xd8\x9eK\x12G?\xcb\x9b\x9b\x9b\x9b\x9b\x9cK\x13G?\xa4\x14\x14\x14\x14\x14\x14K\x14X\x08\x00\x00\x00discretaq\x01K\x15K\x02K\x16X\x02\x00\x00\x00daq\x02K\x17G?\xe4z\xe1G\xae\x14{K\x18G@\x15\x00\x00\x00\x00\x00\x00K\x19G?\xe4z\xe1G\xae\x14|K\x1aK2K\x1bK\x01K\x1cK\x03K\x1dG?\xd5UUUUUUK\x1eG?\xc5UUUUUUK\x1fK\x01K K\x04K!G?\xaf\xf2\xe4\x8e\x8aq\xdeK"K\x04K#X\x04\x00\x00\x00mareq\x03u.'
I tried doing it this way
strputere = enc.decode()
print(strputere)
and I get an error
File "encode.py", line 4, in <module>
strputere = enc.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
I started doing a bit of research, and I found that b stands for bytes.
So my enc variable is a bytes string literal. I've looked into .decode() and it seemed like it was a good choice - but it might be not.
I'm a bit confused because it is a bytes string literal, but it contains some characters (such as \x80) that I think they are UTF-8 characters.
So, how can I decode this, and what would be the algorithm for that? I would love to understand what happens, I did my research but I'm a bit lost, I'd need some help.
So, generally when you have a byte sequence you have two different ways to approach it, depending on the contents:
If dealing with a pure string sequence, you need to decode using the following:
enc.decode("utf-8")
Keep in mind that in this case, you must know what encoding was used (here utf-8). But it appears that it might be incorrect according to the error message you got. S
If you don't know the encoding but you know its definitely a string-encoding, you can take a look at the options mentioned in this question here
If you are using an embedded device, or any bytes input that might contain a series of data, and not just one field, you must use struct.unpack(). This is a bit more complicated, and you will need to go through the docs to find the exact string you must use to decode.
The way it works is that you tell python what each bytes are (string, int, etc) and how long each one is, and it will convert it into a tuple of objects as follows:
values = list(struct.unpack('>BBHBBhBHhHL', enc))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With