I have a 1D numpy array, and some offset/length values. I would like to extract from this array all entries which fall within offset, offset+length, which are then used to build up a new 'reduced' array from the original one, that only consists of those values picked by the offset/length pairs.
For a single offset/length pair this is trivial with standard array slicing [offset:offset+length]
. But how can I do this efficiently (i.e. without any loops) for many offset/length values?
Thanks, Mark
>>> import numpy as np
>>> a = np.arange(100)
>>> ind = np.concatenate((np.arange(5),np.arange(10,15),np.arange(20,30,2),np.array([8])))
>>> a[[ind]]
array([ 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 8])
There is the naive method; just doing the slices:
>>> import numpy as np
>>> a = np.arange(100)
>>>
>>> offset_length = [(3,10),(50,3),(60,20),(95,1)]
>>>
>>> np.concatenate([a[offset:offset+length] for offset,length in offset_length])
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50, 51, 52, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 95])
The following might be faster, but you would have to test/benchmark.
It works by constructing a list of the desired indices, which is valid method of indexing a numpy array.
>>> indices = [offset + i for offset,length in offset_length for i in xrange(length)]
>>> a[indices]
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50, 51, 52, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 95])
It's not clear if this would actually be faster than the naive method but it might be if you have a lot of very short intervals. But I don't know.
(This last method is basically the same as @fraxel's solution, just using a different method of making the index list.)
I've tested a few different cases: a few short intervals, a few long intervals, lots of short intervals. I used the following script:
import timeit
setup = 'import numpy as np; a = np.arange(1000); offset_length = %s'
for title, ol in [('few short', '[(3,10),(50,3),(60,10),(95,1)]'),
('few long', '[(3,100),(200,200),(600,300)]'),
('many short', '[(2*x,1) for x in range(400)]')]:
print '**',title,'**'
print 'dbaupp 1st:', timeit.timeit('np.concatenate([a[offset:offset+length] for offset,length in offset_length])', setup % ol, number=10000)
print 'dbaupp 2nd:', timeit.timeit('a[[offset + i for offset,length in offset_length for i in xrange(length)]]', setup % ol, number=10000)
print ' fraxel:', timeit.timeit('a[np.concatenate([np.arange(offset,offset+length) for offset,length in offset_length])]', setup % ol, number=10000)
This outputs:
** few short **
dbaupp 1st: 0.0474979877472
dbaupp 2nd: 0.190793991089
fraxel: 0.128381967545
** few long **
dbaupp 1st: 0.0416231155396
dbaupp 2nd: 1.58000087738
fraxel: 0.228138923645
** many short **
dbaupp 1st: 3.97210478783
dbaupp 2nd: 2.73584890366
fraxel: 7.34302687645
This suggests that my first method is the fastest when you have a few intervals (and it is significantly faster), and my second is the fastest when you have lots of intervals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With