I have a numpy array with only some values being valid and the rest being nan. example:
[nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]
I would like to split it into a list of chunks containing every time the valid data. The result would be
[[1,2,3], [10,11], [23,1], [7,8]]
I managed to get it done by iterating over the array, checking isfinite() and producing (start,stop) indexes.
However... It is painfully slow...
Do you perhaps have a better idea?
NumPy: hsplit() function The hsplit() function is used to split an array into multiple sub-arrays horizontally (column-wise). hsplit is equivalent to split with axis=1, the array is always split along the second axis regardless of the array dimension.
Splitting NumPy Arrays Splitting is reverse operation of Joining. Joining merges multiple arrays into one and Splitting breaks one array into multiple. We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.
An array needs to explicitly import the array module for declaration. A 2D array is simply an array of arrays. The numpy. array_split() method in Python is used to split a 2D array into multiple sub-arrays of equal size.
To split the elements of a given array with spaces we will use numpy. char. split(). It is a function for doing string operations in NumPy.
Here is another possibility:
import numpy as np
nan = np.nan
def using_clump(a):
return [a[s] for s in np.ma.clump_unmasked(np.ma.masked_invalid(a))]
x = [nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]
In [56]: using_clump(x)
Out[56]:
[array([ 1., 2., 3.]),
array([ 10., 11.]),
array([ 23., 1.]),
array([ 7., 8.])]
Some benchmarks comparing using_clump and using_groupby:
import itertools as IT
groupby = IT.groupby
def using_groupby(a):
return [list(v) for k,v in groupby(a,np.isfinite) if k]
In [58]: %timeit using_clump(x)
10000 loops, best of 3: 37.3 us per loop
In [59]: %timeit using_groupby(x)
10000 loops, best of 3: 53.1 us per loop
The performance is even better for larger arrays:
In [9]: x = x*1000
In [12]: %timeit using_clump(x)
100 loops, best of 3: 5.69 ms per loop
In [13]: %timeit using_groupby(x)
10 loops, best of 3: 60 ms per loop
I'd use itertools.groupby
-- It might be slightly faster:
from numpy import NaN as nan
import numpy as np
a = np.array([nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8])
from itertools import groupby
result = [list(v) for k,v in groupby(a,np.isfinite) if k]
print result #[[1.0, 2.0, 3.0], [10.0, 11.0], [23.0, 1.0], [7.0, 8.0]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With