Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a fixed-size byte variable in Python

Let's say, I have a string (Unicode if it matters) variable which is less than 100 bytes. I want to create another variable with exactly 100 byte in size which includes this string and is padded with zero or whatever. How would I do it in Python 3?

like image 203
Mikael S. Avatar asked Jun 17 '14 18:06

Mikael S.


People also ask

How do you declare a byte variable in Python?

For a single byte, you basically have three choices: A length 1 bytes (or bytearray ) object mychar = b'\xff' (or mychar = bytearray(b'\xff') ) An int that you don't assign values outside range(256) (or use masking to trim overflow): mychar = 0xff. A ctypes type, e.g. mychar = ctypes.

How do you write a byte in Python?

Write Bytes to File in Python Example 1: Open a file in binary write mode and then specify the contents to write in the form of bytes. Next, use the write function to write the byte contents to a binary file.

What is byte size in Python?

One byte is a memory location with a size of 8 bits. A bytes object is an immutable sequence of bytes, conceptually similar to a string. of a bytes object is an unsigned int that satisfies 0 b 1111 _ 1111 >= x >= 0.

Does Python have byte type?

Python supports a range of types to store sequences. There are six sequence types: strings, byte sequences (bytes objects), byte arrays (bytearray objects), lists, tuples, and range objects. Strings contain Unicode characters.


4 Answers

For assembling packets to go over the network, or for assembling byte-perfect binary files, I suggest using the struct module.

  • struct — Interpret bytes as packed binary data

Just for the string, you might not need struct, but as soon as you start also packing binary values, struct will make your life much easier.

Depending on your needs, you might be better off with an off-the-shelf network serialization library, such as Protocol Buffers; or you might even just use JSON for the wire format.

  • Protocol Buffer Basics: Python
  • PyMOTW - JavaScript Object Notation Serializer
like image 190
steveha Avatar answered Sep 29 '22 06:09

steveha


Something like this should work:

st = "具有"
by = bytes(st, "utf-8")
by += b"0" * (100 - len(by))
print(by)
# b'\xe5\x85\xb7\xe6\x9c\x890000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'

Obligatory addendum since your original post seems to conflate strings with the length of their encoded byte representation: Python unicode explanation

like image 39
Daenyth Avatar answered Sep 29 '22 07:09

Daenyth


To pad with null bytes you can do it the way they do it in the stdlib base64 module.

some_data = b'foosdsfkl\x05'
null_padded = some_data + bytes(100 - len(some_data))
like image 35
Chris Wesseling Avatar answered Sep 29 '22 06:09

Chris Wesseling


Here's a roundabout way of doing it:

>>> import sys
>>> a = "a"
>>> sys.getsizeof(a)
22
>>> a = "aa"
>>> sys.getsizeof(a)
23
>>> a = "aaa"
>>> sys.getsizeof(a)
24

So following this, an ASCII string of 100 bytes will need to be 79 characters long

>>> a = "".join(["a" for i in range(79)])
>>> len(a)
79
>>> sys.getsizeof(a)
100

This approach above is a fairly simple way of "calibrating" strings to figure out their lengths. You could automate a script to pad a string out to the appropriate memory size to account for other encodings.

def padder(strng):
    TARGETSIZE = 100
    padChar = "0"

    curSize = sys.getsizeof(strng)

    if curSize <= TARGETSIZE:
        for i in range(TARGETSIZE - curSize):
            strng = padChar + strng

        return strng
    else:
        return strng  # Not sure if you need to handle strings that start longer than your target, but you can do that here
like image 29
wnnmaw Avatar answered Sep 29 '22 05:09

wnnmaw