Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

packing and unpacking variable length array/string using the struct module in python

I am trying to get a grip around the packing and unpacking of binary data in Python 3. Its actually not that hard to understand, except one problem:

what if I have a variable length textstring and want to pack and unpack this in the most elegant manner?

As far as I can tell from the manual I can only unpack fixed size strings directly? In that case, are there any elegant way of getting around this limitation without padding lots and lots of unnecessary zeroes?

like image 720
agnsaft Avatar asked Sep 20 '10 16:09

agnsaft


People also ask

How do you pack a string in Python?

struct.pack() struct. pack() is the function that converts a given list of values into their corresponding string representation. It requires the user to specify the format and order of the values that need to be converted.

What is struct module in Python?

The module struct is used to convert the native data types of Python into string of bytes and vice versa. We don't have to install it. It's a built-in module available in Python3. The struct module is related to the C languages.

What does struct Calcsize do?

struct. calcsize('P') calculates the number of bytes required to store a single pointer -- returning 4 on a 32-bit system and 8 on a 64-bit system.


2 Answers

The struct module does only support fixed-length structures. For variable-length strings, your options are either:

  • Dynamically construct your format string (a str will have to be converted to a bytes before passing it to pack()):

    s = bytes(s, 'utf-8')    # Or other appropriate encoding struct.pack("I%ds" % (len(s),), len(s), s) 
  • Skip struct and just use normal string methods to add the string to your pack()-ed output: struct.pack("I", len(s)) + s

For unpacking, you just have to unpack a bit at a time:

(i,), data = struct.unpack("I", data[:4]), data[4:] s, data = data[:i], data[i:] 

If you're doing a lot of this, you can always add a helper function which uses calcsize to do the string slicing:

def unpack_helper(fmt, data):     size = struct.calcsize(fmt)     return struct.unpack(fmt, data[:size]), data[size:] 
like image 74
llasram Avatar answered Sep 21 '22 10:09

llasram


I've googled up this question and a couple of solutions.

construct

An elaborate, flexible solution.

Instead of writing imperative code to parse a piece of data, you declaratively define a data structure that describes your data. As this data structure is not code, you can use it in one direction to parse data into Pythonic objects, and in the other direction, convert (“build”) objects into binary data.

The library provides both simple, atomic constructs (such as integers of various sizes), as well as composite ones which allow you form hierarchical structures of increasing complexity. Construct features bit and byte granularity, easy debugging and testing, an easy-to-extend subclass system, and lots of primitive constructs to make your work easier:

Updated: Python 3.x, construct 2.10.67; also they have native PascalString, so renamed

     from construct import *          myPascalString = Struct(         "length" / Int8ul,         "data" / Bytes(lambda ctx: ctx.length)     )      >>> myPascalString.parse(b'\x05helloXXX')     Container(length=5, data=b'hello')     >>> myPascalString.build(Container(length=6, data=b"foobar"))     b'\x06foobar'       myPascalString2 = ExprAdapter(myPascalString,         encoder=lambda obj, ctx: Container(length=len(obj), data=obj),         decoder=lambda obj, ctx: obj.data     )      >>> myPascalString2.parse(b"\x05hello")     b'hello'      >>> myPascalString2.build(b"i'm a long string")     b"\x11i'm a long string" 

ed: Also pay attention to that ExprAdapter, once native PascalString won't be doing what you need from it, this is what you will be doing.

netstruct

A quick solution if you only need a struct extension for variable length byte sequences. Nesting a variable-length structure can be achieved by packing the first pack results.

NetStruct supports a new formatting character, the dollar sign ($). The dollar sign represents a variable-length string, encoded with its length preceeding the string itself.

edit: Looks like the length of a variable-length string uses the same data type as the elements. Thus, the maximum length of variable-length string of bytes is 255, if words - 65535, and so on.

import netstruct >>> netstruct.pack(b"b$", b"Hello World!") b'\x0cHello World!'  >>> netstruct.unpack(b"b$", b"\x0cHello World!") [b'Hello World!'] 
like image 34
Victor Sergienko Avatar answered Sep 19 '22 10:09

Victor Sergienko