I'm looking for speedy alternatives to my function. the goal is to make a list of 32 bit integers based of any length integers. The length is explicitly given in a tuple of (value, bitlength). This is part of a bit-banging procedure for a asynchronous interface which takes 4 32 bit integers per bus transaction.
All ints are unsigned, positive or zero, the length can vary between 0 and 2000
My inputs is a list of these tuples, the output should be integers with implicit 32 bit length, with the bits in sequential order. The remaining bits not fitting into 32 should also be returned.
input: [(0,128),(1,12),(0,32)]
output:[0, 0, 0, 0, 0x100000], 0, 12
I've spent a day or two on profiling with cProfile, and trying different methods, but I seem to be kind of stuck with functions that takes ~100k tuples in one second, which is kinda slow. Ideally i would like a 10x speedup, but I haven't got enough experience to know where to start. The ultimate goal for the speed of this is to be faster than 4M tuples per second.
Thanks for any help or suggestions.
the fastest i can do is:
def foo(tuples):
'''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
length = 0
remlen = 0
remint = 0
i32list = []
for a, b in tuples:
n = (remint << (32-remlen)) | a #n = (a << (remlen)) | remint
length += b
if length > 32:
len32 = int(length/32)
for i in range(len32):
i32list.append((n >> i*32) & 0xFFFFFFFF)
remint = n >> (len32*32)
remlen = length - len32*32
length = remlen
elif length == 32:
appint = n & 0xFFFFFFFF
remint = 0
remlen = 0
length -= 32
i32list.append(appint)
else:
remint = n
remlen = length
return i32list, remint, remlen
this has very similar performance:
def tpli_2_32ili(tuples):
'''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
# binarylist = "".join([np.binary_repr(a, b) for a, b in inp]) # bin(a)[2:].rjust(b, '0')
binarylist = "".join([bin(a)[2:].rjust(b, '0') for a, b in tuples])
totallength = len(binarylist)
tot32 = int(totallength/32)
i32list = [int(binarylist[i:i+32],2) for i in range(0, tot32*32, 32) ]
remlen = totallength - tot32*32
remint = int(binarylist[-remlen:],2)
return i32list, remint, remlen
for division the exponents are subtracted which is very fast for 8 bit, takes < 5% of time. so in effect you are comparing a 3 byte division with a 4 byte division .
You can use list comprehension instead of for loops to convert a list of strings to a list of ints as shown below. myList = ['1', '2', '23', '32', '12', '44', '34', '55', '46', '21'] new_list = [int(element) for element in myList] print("The list of strings is:") print(myList) print("The list of ints is:") print(new_list) Output:
The list of ints is: [1, 2, 23, 32, 12, 44, 34, 55, 46, 21] Convert a List of Lists of Strings to Ints Inplace in Python
While iteration, we will first check if the current string can be converted to int or not using the isdigit() method. If the string can be converted to an int, we will use the int()function to convert the string into an int. Otherwise, we will say that the current string cannot be converted into an integer.
The best I could come up with so far is a 25% speed-up
from functools import reduce
intMask = 0xffffffff
def f(x,y):
return (x[0] << y[1]) + y[0], x[1] + y[1]
def jens(input):
n, length = reduce( f , input, (0,0) )
remainderBits = length % 32
intBits = length - remainderBits
remainder = ((n & intMask) << (32 - remainderBits)) >> (32 - remainderBits)
n >>= remainderBits
ints = [n & (intMask << i) for i in range(intBits-32, -32, -32)]
return ints, remainderBits, remainder
print([hex(x) for x in jens([(0,128),(1,12),(0,32)])[0]])
It uses a long
to sum up the tuple values according to their bit length, and then extract the 32-bit values and the remaining bits from this number. The computastion of the overall length (summing up the length values of the input tuple) and the computation of the large value are done in a single loop with reduce to use an intrinsic loop.
Running matineau's benchmark harness prints, the best numbers I have seen are:
Fastest to slowest execution speeds using Python 3.6.5
(1,000 executions, best of 3 repetitions)
jens : 0.004151 secs, rel speed 1.00x, 0.00% slower
First snippet : 0.005259 secs, rel speed 1.27x, 26.70% slower
Second snippet : 0.008328 secs, rel speed 2.01x, 100.64% slower
You could probably gain a better speed-up if you use some C extension implementing a bit array.
This isn't an answer with a faster implementation. Instead it's the code in the two snippets you have in your question placed within an extensible benchmarking framework that makes comparing different approaches very easy.
Comparing just those two testcases, it indicates that your second approach does not have very similar performance to the first, based on the output shown. In fact, it's almost twice as slow.
from collections import namedtuple
import sys
from textwrap import dedent
import timeit
import traceback
N = 1000 # Number of executions of each "algorithm".
R = 3 # Number of repetitions of those N executions.
# Common setup for all testcases (executed before any algorithm specific setup).
COMMON_SETUP = dedent("""
# Import any resources needed defined in outer benchmarking script.
#from __main__ import ??? # Not needed at this time
""")
class TestCase(namedtuple('CodeFragments', ['setup', 'test'])):
""" A test case is composed of separate setup and test code fragments. """
def __new__(cls, setup, test):
""" Dedent code fragment in each string argument. """
return tuple.__new__(cls, (dedent(setup), dedent(test)))
testcases = {
"First snippet": TestCase("""
def foo(tuples):
'''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
length = 0
remlen = 0
remint = 0
i32list = []
for a, b in tuples:
n = (remint << (32-remlen)) | a #n = (a << (remlen)) | remint
length += b
if length > 32:
len32 = int(length/32)
for i in range(len32):
i32list.append((n >> i*32) & 0xFFFFFFFF)
remint = n >> (len32*32)
remlen = length - len32*32
length = remlen
elif length == 32:
appint = n & 0xFFFFFFFF
remint = 0
remlen = 0
length -= 32
i32list.append(appint)
else:
remint = n
remlen = length
return i32list, remint, remlen
""", """
foo([(0,128),(1,12),(0,32)])
"""
),
"Second snippet": TestCase("""
def tpli_2_32ili(tuples):
'''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
binarylist = "".join([bin(a)[2:].rjust(b, '0') for a, b in tuples])
totallength = len(binarylist)
tot32 = int(totallength/32)
i32list = [int(binarylist[i:i+32],2) for i in range(0, tot32*32, 32) ]
remlen = totallength - tot32*32
remint = int(binarylist[-remlen:],2)
return i32list, remint, remlen
""", """
tpli_2_32ili([(0,128),(1,12),(0,32)])
"""
),
}
# Collect timing results of executing each testcase multiple times.
try:
results = [
(label,
min(timeit.repeat(testcases[label].test,
setup=COMMON_SETUP + testcases[label].setup,
repeat=R, number=N)),
) for label in testcases
]
except Exception:
traceback.print_exc(file=sys.stdout) # direct output to stdout
sys.exit(1)
# Display results.
major, minor, micro = sys.version_info[:3]
print('Fastest to slowest execution speeds using Python {}.{}.{}\n'
'({:,d} executions, best of {:d} repetitions)'.format(major, minor, micro, N, R))
print()
longest = max(len(result[0]) for result in results) # length of longest label
ranked = sorted(results, key=lambda t: t[1]) # ascending sort by execution time
fastest = ranked[0][1]
for result in ranked:
print('{:>{width}} : {:9.6f} secs, rel speed {:5,.2f}x, {:8,.2f}% slower '
''.format(
result[0], result[1], round(result[1]/fastest, 2),
round((result[1]/fastest - 1) * 100, 2),
width=longest))
Output:
Fastest to slowest execution speeds using Python 3.6.5
(1,000 executions, best of 3 repetitions)
First snippet : 0.003024 secs, rel speed 1.00x, 0.00% slower
Second snippet : 0.005085 secs, rel speed 1.68x, 68.13% slower
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With