Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy: split 1D array of chunks separated by nans into a list of the chunks

Tags:

python

numpy

I have a numpy array with only some values being valid and the rest being nan. example:

[nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]

I would like to split it into a list of chunks containing every time the valid data. The result would be

[[1,2,3], [10,11], [23,1], [7,8]]

I managed to get it done by iterating over the array, checking isfinite() and producing (start,stop) indexes.

However... It is painfully slow...

Do you perhaps have a better idea?

like image 239
ronszon Avatar asked Jan 30 '13 13:01

ronszon


People also ask

How can we divide the one NumPy array into two different array?

NumPy: hsplit() function The hsplit() function is used to split an array into multiple sub-arrays horizontally (column-wise). hsplit is equivalent to split with axis=1, the array is always split along the second axis regardless of the array dimension.

How do you divide a NumPy array into equal parts?

Splitting NumPy Arrays Splitting is reverse operation of Joining. Joining merges multiple arrays into one and Splitting breaks one array into multiple. We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.

How do you split a multidimensional array in Python?

An array needs to explicitly import the array module for declaration. A 2D array is simply an array of arrays. The numpy. array_split() method in Python is used to split a 2D array into multiple sub-arrays of equal size.

How do you split the element of a given NumPy array with spaces?

To split the elements of a given array with spaces we will use numpy. char. split(). It is a function for doing string operations in NumPy.


2 Answers

Here is another possibility:

import numpy as np
nan = np.nan

def using_clump(a):
    return [a[s] for s in np.ma.clump_unmasked(np.ma.masked_invalid(a))]

x = [nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]

In [56]: using_clump(x)
Out[56]: 
[array([ 1.,  2.,  3.]),
 array([ 10.,  11.]),
 array([ 23.,   1.]),
 array([ 7.,  8.])]

Some benchmarks comparing using_clump and using_groupby:

import itertools as IT
groupby = IT.groupby
def using_groupby(a):
    return [list(v) for k,v in groupby(a,np.isfinite) if k]

In [58]: %timeit using_clump(x)
10000 loops, best of 3: 37.3 us per loop

In [59]: %timeit using_groupby(x)
10000 loops, best of 3: 53.1 us per loop

The performance is even better for larger arrays:

In [9]: x = x*1000
In [12]: %timeit using_clump(x)
100 loops, best of 3: 5.69 ms per loop

In [13]: %timeit using_groupby(x)
10 loops, best of 3: 60 ms per loop
like image 57
unutbu Avatar answered Oct 06 '22 22:10

unutbu


I'd use itertools.groupby -- It might be slightly faster:

from numpy import NaN as nan
import numpy as np
a = np.array([nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8])
from itertools import groupby
result = [list(v) for k,v in groupby(a,np.isfinite) if k]
print result #[[1.0, 2.0, 3.0], [10.0, 11.0], [23.0, 1.0], [7.0, 8.0]]
like image 20
mgilson Avatar answered Oct 06 '22 21:10

mgilson