Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function speed improvement: Convert ints to list of 32bit ints

Tags:

python

I'm looking for speedy alternatives to my function. the goal is to make a list of 32 bit integers based of any length integers. The length is explicitly given in a tuple of (value, bitlength). This is part of a bit-banging procedure for a asynchronous interface which takes 4 32 bit integers per bus transaction.

All ints are unsigned, positive or zero, the length can vary between 0 and 2000

My inputs is a list of these tuples, the output should be integers with implicit 32 bit length, with the bits in sequential order. The remaining bits not fitting into 32 should also be returned.

input: [(0,128),(1,12),(0,32)]
output:[0, 0, 0, 0, 0x100000], 0, 12

I've spent a day or two on profiling with cProfile, and trying different methods, but I seem to be kind of stuck with functions that takes ~100k tuples in one second, which is kinda slow. Ideally i would like a 10x speedup, but I haven't got enough experience to know where to start. The ultimate goal for the speed of this is to be faster than 4M tuples per second.

Thanks for any help or suggestions.

the fastest i can do is:

def foo(tuples):
    '''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
    length = 0
    remlen = 0
    remint = 0
    i32list = []
    for a, b in tuples:
        n = (remint << (32-remlen)) | a #n = (a << (remlen)) | remint
        length += b
        if length > 32:
            len32 = int(length/32)
            for i in range(len32):
                i32list.append((n >> i*32) & 0xFFFFFFFF)
            remint = n >> (len32*32)
            remlen = length - len32*32
            length = remlen
        elif length == 32:
            appint = n & 0xFFFFFFFF
            remint = 0
            remlen = 0
            length -= 32
            i32list.append(appint)
        else:
            remint = n
            remlen = length
    return i32list, remint, remlen

this has very similar performance:

def tpli_2_32ili(tuples):
    '''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
#    binarylist = "".join([np.binary_repr(a, b) for a, b in inp]) # bin(a)[2:].rjust(b, '0')
    binarylist = "".join([bin(a)[2:].rjust(b, '0') for a, b in tuples])
    totallength = len(binarylist)
    tot32 = int(totallength/32)
    i32list = [int(binarylist[i:i+32],2) for i in range(0, tot32*32, 32) ]
    remlen = totallength - tot32*32
    remint = int(binarylist[-remlen:],2)
    return i32list, remint, remlen
like image 965
user55924 Avatar asked May 02 '18 13:05

user55924


People also ask

How fast is 8 bit division compared to 3 bit Division?

for division the exponents are subtracted which is very fast for 8 bit, takes < 5% of time. so in effect you are comparing a 3 byte division with a 4 byte division .

How to convert a list of strings to list of Ints?

You can use list comprehension instead of for loops to convert a list of strings to a list of ints as shown below. myList = ['1', '2', '23', '32', '12', '44', '34', '55', '46', '21'] new_list = [int(element) for element in myList] print("The list of strings is:") print(myList) print("The list of ints is:") print(new_list) Output:

What is the list of iNTS in Python?

The list of ints is: [1, 2, 23, 32, 12, 44, 34, 55, 46, 21] Convert a List of Lists of Strings to Ints Inplace in Python

How to convert a string to an integer in JavaScript?

While iteration, we will first check if the current string can be converted to int or not using the isdigit() method. If the string can be converted to an int, we will use the int()function to convert the string into an int. Otherwise, we will say that the current string cannot be converted into an integer.


2 Answers

The best I could come up with so far is a 25% speed-up

from functools import reduce

intMask = 0xffffffff

def f(x,y):
    return (x[0] << y[1]) + y[0], x[1] + y[1]

def jens(input):
    n, length = reduce( f , input, (0,0) )
    remainderBits = length % 32
    intBits = length - remainderBits
    remainder = ((n & intMask) << (32 - remainderBits)) >> (32 - remainderBits)
    n >>= remainderBits

    ints = [n & (intMask << i) for i in range(intBits-32, -32, -32)]
    return ints, remainderBits, remainder

print([hex(x) for x in jens([(0,128),(1,12),(0,32)])[0]])

It uses a long to sum up the tuple values according to their bit length, and then extract the 32-bit values and the remaining bits from this number. The computastion of the overall length (summing up the length values of the input tuple) and the computation of the large value are done in a single loop with reduce to use an intrinsic loop.

Running matineau's benchmark harness prints, the best numbers I have seen are:

Fastest to slowest execution speeds using Python 3.6.5
(1,000 executions, best of 3 repetitions)

          jens :  0.004151 secs, rel speed  1.00x,     0.00% slower
 First snippet :  0.005259 secs, rel speed  1.27x,    26.70% slower
Second snippet :  0.008328 secs, rel speed  2.01x,   100.64% slower

You could probably gain a better speed-up if you use some C extension implementing a bit array.

like image 142
Jens Avatar answered Oct 16 '22 14:10

Jens


This isn't an answer with a faster implementation. Instead it's the code in the two snippets you have in your question placed within an extensible benchmarking framework that makes comparing different approaches very easy.

Comparing just those two testcases, it indicates that your second approach does not have very similar performance to the first, based on the output shown. In fact, it's almost twice as slow.

from collections import namedtuple
import sys
from textwrap import dedent
import timeit
import traceback

N = 1000  # Number of executions of each "algorithm".
R = 3  # Number of repetitions of those N executions.

# Common setup for all testcases (executed before any algorithm specific setup).
COMMON_SETUP = dedent("""
    # Import any resources needed defined in outer benchmarking script.
    #from __main__ import ??? # Not needed at this time
""")


class TestCase(namedtuple('CodeFragments', ['setup', 'test'])):
    """ A test case is composed of separate setup and test code fragments. """
    def __new__(cls, setup, test):
        """ Dedent code fragment in each string argument. """
        return tuple.__new__(cls, (dedent(setup), dedent(test)))


testcases = {
    "First snippet": TestCase("""
        def foo(tuples):
            '''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
            length = 0
            remlen = 0
            remint = 0
            i32list = []
            for a, b in tuples:
                n = (remint << (32-remlen)) | a #n = (a << (remlen)) | remint
                length += b
                if length > 32:
                    len32 = int(length/32)
                    for i in range(len32):
                        i32list.append((n >> i*32) & 0xFFFFFFFF)
                    remint = n >> (len32*32)
                    remlen = length - len32*32
                    length = remlen
                elif length == 32:
                    appint = n & 0xFFFFFFFF
                    remint = 0
                    remlen = 0
                    length -= 32
                    i32list.append(appint)
                else:
                    remint = n
                    remlen = length

            return i32list, remint, remlen
        """, """
        foo([(0,128),(1,12),(0,32)])
        """

    ),
    "Second snippet": TestCase("""
        def tpli_2_32ili(tuples):
            '''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
            binarylist = "".join([bin(a)[2:].rjust(b, '0') for a, b in tuples])
            totallength = len(binarylist)
            tot32 = int(totallength/32)
            i32list = [int(binarylist[i:i+32],2) for i in range(0, tot32*32, 32) ]
            remlen = totallength - tot32*32
            remint = int(binarylist[-remlen:],2)
            return i32list, remint, remlen
        """, """
        tpli_2_32ili([(0,128),(1,12),(0,32)])
        """
    ),
}

# Collect timing results of executing each testcase multiple times.
try:
    results = [
        (label,
         min(timeit.repeat(testcases[label].test,
                           setup=COMMON_SETUP + testcases[label].setup,
                           repeat=R, number=N)),
        ) for label in testcases
    ]
except Exception:
    traceback.print_exc(file=sys.stdout)  # direct output to stdout
    sys.exit(1)

# Display results.
major, minor, micro = sys.version_info[:3]
print('Fastest to slowest execution speeds using Python {}.{}.{}\n'
      '({:,d} executions, best of {:d} repetitions)'.format(major, minor, micro, N, R))
print()

longest = max(len(result[0]) for result in results)  # length of longest label
ranked = sorted(results, key=lambda t: t[1]) # ascending sort by execution time
fastest = ranked[0][1]
for result in ranked:
    print('{:>{width}} : {:9.6f} secs, rel speed {:5,.2f}x, {:8,.2f}% slower '
          ''.format(
                result[0], result[1], round(result[1]/fastest, 2),
                round((result[1]/fastest - 1) * 100, 2),
                width=longest))

Output:

Fastest to slowest execution speeds using Python 3.6.5
(1,000 executions, best of 3 repetitions)

 First snippet :  0.003024 secs, rel speed  1.00x,     0.00% slower
Second snippet :  0.005085 secs, rel speed  1.68x,    68.13% slower
like image 1
martineau Avatar answered Oct 16 '22 13:10

martineau