I use the following code to load 24-bit binary data
into a 16-bit numpy
array :
temp = numpy.zeros((len(data) / 3, 4), dtype='b')
temp[:, 1:] = numpy.frombuffer(data, dtype='b').reshape(-1, 3)
temp2 = temp.view('<i4').flatten() >> 16 # >> 16 because I need to divide by 2**16 to load my data into 16-bit array, needed for my (audio) application
output = temp2.astype('int16')
I imagine that it's possible to improve the speed efficiency, but how?
It seems like you are being very roundabout here. Won't this do the same thing?
output = np.frombuffer(data,'b').reshape(-1,3)[:,1:].flatten().view('i2')
This would save some time from not zero-filling a temporary array, skipping the bitshift and avoiding some unneceessary data moves. I haven't actually benchmarked it yet, though, and I expect the savings to be modest.
Edit: I have now performed the benchmark. For len(data)
of 12 million, I get 80 ms for your version and 39 ms for mine, so pretty much exactly a factor 2 speedup. Not a very big improvement, as expected, but then your starting point was already pretty fast.
Edit2: I should mention that I have assumed little endian here. However, the original question's code is also implicitly assuming little endian, so this is not a new assumption on my part.
(For big endian (data and architecture), you would replace 1:
by :-1
. If the data had a different endianness than the CPU, then you would also need to reverse the order of the bytes (::-1
).)
Edit3: For even more speed, I think you will have to go outside python. This fortran function, which also uses openMP, gets me a factor 2+ speedup compared to my version (so 4+ times faster than yours).
subroutine f(a,b)
implicit none
integer*1, intent(in) :: a(:)
integer*1, intent(out) :: b(size(a)*2/3)
integer :: i
!$omp parallel do
do i = 1, size(a)/3
b(2*(i-1)+1) = a(3*(i-1)+2)
b(2*(i-1)+2) = a(3*(i-1)+3)
end do
!$omp end parallel do
end subroutine
Compile with FOPT="-fopenmp" f2py -c -m basj{,.f90} -lgomp
. You can then import and use it in python:
import basj
def convert(data): return def mine2(data): return basj.f(np.frombuffer(data,'b')).view('i2')
You can control the number of cores to use via the environment variavble OMP_NUM_THREADS
, but it defaults to using all available cores.
Inspired by @amaurea's answer, here is a cython
version (I already used cython in my original code, so I'll continue with cython instead of mixing cython + fortran) :
import cython
import numpy as np
cimport numpy as np
def binary24_to_int16(char *data):
cdef int i
res = np.zeros(len(data)/3, np.int16)
b = <char *>((<np.ndarray>res).data)
for i in range(len(data)/3):
b[2*i] = data[3*i+1]
b[2*i+1] = data[3*i+2]
return res
There is a factor 4 speed gain :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With