Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes?

Q: How do you convert int to bytes?

An int value can be converted into bytes by using the method int. to_bytes(). The method is invoked on an int value, is not supported by Python 2 (requires minimum Python3) for execution.

Q: Why is Endianness necessary?

Note that each 2 hex letters represent 1 byte. In the end, that is why endianness is important; because not knowing how data is stored would lead to communicating different values. For example, all x86_64 processors (Intel/AMD) use little-endian while IP/TCP uses big-endian.

Q: What is little endian byte swap?

In little-endian style, the bytes are written from left to right in increasing significance. In big-endian style, the bytes are written from left to right in decreasing significance. The swapbytes function swaps the byte ordering in memory, converting little endian to big endian (and vice versa).

Tags:

python

optimization

I have this Python code to do this:

from struct import pack as _pack

def packl(lnum, pad = 1):
    if lnum < 0:
        raise RangeError("Cannot use packl to convert a negative integer "
                         "to a string.")
    count = 0
    l = []
    while lnum > 0:
        l.append(lnum & 0xffffffffffffffffL)
        count += 1
        lnum >>= 64
    if count <= 0:
        return '\0' * pad
    elif pad >= 8:
        lens = 8 * count % pad
        pad = ((lens != 0) and (pad - lens)) or 0
        l.append('>' + 'x' * pad + 'Q' * count)
        l.reverse()
        return _pack(*l)
    else:
        l.append('>' + 'Q' * count)
        l.reverse()
        s = _pack(*l).lstrip('\0')
        lens = len(s)
        if (lens % pad) != 0:
            return '\0' * (pad - lens % pad) + s
        else:
            return s

This takes approximately 174 usec to convert 2**9700 - 1 to a string of bytes on my machine. If I'm willing to use the Python 2.7 and Python 3.x specific bit_length method, I can shorten that to 159 usecs by pre-allocating the l array to be the exact right size at the very beginning and using l[something] = syntax instead of l.append.

Is there anything I can do that will make this faster? This will be used to convert large prime numbers used in cryptography as well as some (but not many) smaller numbers.

Edit

This is currently the fastest option in Python < 3.2, it takes about half the time either direction as the accepted answer:

def packl(lnum, padmultiple=1):
    """Packs the lnum (which must be convertable to a long) into a
       byte string 0 padded to a multiple of padmultiple bytes in size. 0
       means no padding whatsoever, so that packing 0 result in an empty
       string.  The resulting byte string is the big-endian two's
       complement representation of the passed in long."""

    if lnum == 0:
        return b'\0' * padmultiple
    elif lnum < 0:
        raise ValueError("Can only convert non-negative numbers.")
    s = hex(lnum)[2:]
    s = s.rstrip('L')
    if len(s) & 1:
        s = '0' + s
    s = binascii.unhexlify(s)
    if (padmultiple != 1) and (padmultiple != 0):
        filled_so_far = len(s) % padmultiple
        if filled_so_far != 0:
            s = b'\0' * (padmultiple - filled_so_far) + s
    return s

def unpackl(bytestr):
    """Treats a byte string as a sequence of base 256 digits
    representing an unsigned integer in big-endian format and converts
    that representation into a Python integer."""

    return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0

In Python 3.2 the int class has to_bytes and from_bytes functions that can accomplish this much more quickly that the method given above.

757

asked Dec 05 '10 10:12

Omnifarious

1 Answers

Here is a solution calling the Python/C API via ctypes. Currently, it uses NumPy, but if NumPy is not an option, it could be done purely with ctypes.

import numpy
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
                               numpy.ctypeslib.ndpointer(numpy.uint8),
                               ctypes.c_size_t,
                               ctypes.c_int,
                               ctypes.c_int]

def packl_ctypes_numpy(lnum):
    a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8)
    PyLong_AsByteArray(lnum, a, a.size, 0, 1)
    return a

On my machine, this is 15 times faster than your approach.

Edit: Here is the same code using ctypes only and returning a string instead of a NumPy array:

import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
                               ctypes.c_char_p,
                               ctypes.c_size_t,
                               ctypes.c_int,
                               ctypes.c_int]

def packl_ctypes(lnum):
    a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1)
    PyLong_AsByteArray(lnum, a, len(a), 0, 1)
    return a.raw

This is another two times faster, totalling to a speed-up factor of 30 on my machine.

146

answered Sep 27 '22 15:09

Sven Marnach

Related questions
                            
                                Python pandas equivalent to R's group_by, mutate, and ifelse
                            
                                I'm using Python regexes in a criminally inefficient manner
                            
                                Using "like" in a cursor/query with a parameter in python (django)
                            
                                Which gps library would you recommend for python?
                            
                                python and ruby - for what to use it? [closed]
                            
                                Remove ppc from compilation flags in python setup scripts
                            
                                How does a classmethod object work?
                            
                                Django: Is there a way to have the "through" model in a ManyToManyField in a different app to the model containing the ManyToManyField?
                            
                                What is a nicer alternative to a namedtuples _replace?
                            
                                Should I use fork or threads?
                            
                                What is it in Java standard library that Python's lacks?
                            
                                "With" statement in Python with multiple files to handle
                            
                                How to determine subprocess.Popen() failed when shell=True
                            
                                os.execv without args argument
                            
                                How do I maximize efficiency with numpy arrays?
                            
                                Get current window title with Python and Xorg
                            
                                Compare result from hexdigest() to a string
                            
                                Passing 'None' as function parameter (where parameter is a function)
                            
                                2d array of lists in python
                            
                                How to increase the font size of a Text widget?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With