numpy: split 1D array of chunks separated by nans into a list of the chunks

Tags:

python

numpy

I have a numpy array with only some values being valid and the rest being nan. example:

[nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]

I would like to split it into a list of chunks containing every time the valid data. The result would be

Click to copy

[[1,2,3], [10,11], [23,1], [7,8]]

I managed to get it done by iterating over the array, checking isfinite() and producing (start,stop) indexes.

However... It is painfully slow...

Do you perhaps have a better idea?

239

asked Jan 30 '13 13:01

ronszon

2 Answers

Here is another possibility:

Click to copy

import numpy as np
nan = np.nan

def using_clump(a):
    return [a[s] for s in np.ma.clump_unmasked(np.ma.masked_invalid(a))]

x = [nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]

In [56]: using_clump(x)
Out[56]: 
[array([ 1.,  2.,  3.]),
 array([ 10.,  11.]),
 array([ 23.,   1.]),
 array([ 7.,  8.])]

Some benchmarks comparing using_clump and using_groupby:

Click to copy

import itertools as IT
groupby = IT.groupby
def using_groupby(a):
    return [list(v) for k,v in groupby(a,np.isfinite) if k]

Click to copy

In [58]: %timeit using_clump(x)
10000 loops, best of 3: 37.3 us per loop

In [59]: %timeit using_groupby(x)
10000 loops, best of 3: 53.1 us per loop

The performance is even better for larger arrays:

Click to copy

In [9]: x = x*1000
In [12]: %timeit using_clump(x)
100 loops, best of 3: 5.69 ms per loop

In [13]: %timeit using_groupby(x)
10 loops, best of 3: 60 ms per loop

answered Oct 06 '22 22:10

unutbu

I'd use itertools.groupby -- It might be slightly faster:

Click to copy

from numpy import NaN as nan
import numpy as np
a = np.array([nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8])
from itertools import groupby
result = [list(v) for k,v in groupby(a,np.isfinite) if k]
print result #[[1.0, 2.0, 3.0], [10.0, 11.0], [23.0, 1.0], [7.0, 8.0]]

answered Oct 06 '22 21:10

mgilson

Related questions
                            
                                Python lxml - How to remove empty repeated tags
                            
                                how to get output of grep command (Python)
                            
                                Why does Django ORM allow me to omit parameters for NOT NULL fields when creating an object?
                            
                                What is the default nltk part of speech tagset?
                            
                                Python object cache
                            
                                Rotated document with ReportLab (vertical text)
                            
                                failure to import pymongo ubuntu
                            
                                Flushing all current figures in matplotlib
                            
                                Python 3 static members
                            
                                Concatenate all rows of a numpy matrix in python
                            
                                Is Python set more space efficient than list?
                            
                                Replace CentralWidget in MainWindow
                            
                                Django model multiple updates with objects' own data?
                            
                                Is sequence unpacking atomic?
                            
                                SciPy NumPy and SciKit-learn , create a sparse matrix
                            
                                Is it possible to use too many functions in Python?
                            
                                Why is Django 1.0.x not able to install from PyPI?
                            
                                python dictionary conundrum
                            
                                Search strings using regular expression in Python
                            
                                Python Headless Browser for GAE

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

numpy: split 1D array of chunks separated by nans into a list of the chunks

Tags:

python

numpy

ronszon

People also ask

2 Answers

unutbu

mgilson

Recent Activity

Donate For Us