Inspired by Haskell:
How can I implement the following with a numpy array in Python?
In [13]: [(x if x>3 else None) for x in range(10)]
Out[13]: [None, None, None, None, 4, 5, 6, 7, 8, 9]
In other words, I am looking for a function for numpy that would have the signature: f:[a]->(a->Maybe a)->[Maybe a] in Haskell, where [a] would be a numpy list.
I was trying this:
np.apply_along_axis(lambda x:x if x>3 else None,0,np.arange(10))
but it does not work:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
NumPy's where() will do the trick:
In [429]: import numpy as np
In [430]: arr = np.arange(10, dtype=np.object)
In [431]: np.where(arr > 3, arr, None)
Out[431]: array([None, None, None, None, 4, 5, 6, 7, 8, 9], dtype=object)
The code above creates a new array. If you wish to modify arr in place, you could use boolean indexing arr[arr < 4] = None (as pointed out by @Chris Mueller) or putmask():
In [432]: np.putmask(arr, arr < 4, None)
In [433]: arr
Out[433]: array([None, None, None, None, 4, 5, 6, 7, 8, 9], dtype=object)
Unless you are constrained to use None as a "flag" value, I would suggest you to stick to @ev-br's recommendation and use np.nan instead. I will follow that approach to assess performance:
In [434]: arr = np.arange(1000000, dtype=np.float)
In [435]: timeit np.where(arr > 3, arr, np.nan)
100 loops, best of 3: 3.61 ms per loop
In [436]: timeit arr[arr < 4] = np.nan
1000 loops, best of 3: 564 µs per loop
In [437]: timeit np.putmask(arr, arr < 4, np.nan)
1000 loops, best of 3: 1.08 ms per loop
Notice that I used a much larger array to further highlight efficiency differences. And the winner is... boolean indexing.
I'd recommend to revisit the premise that you want Nones in a numpy array: you need an array of dtype=object to store a None, i.e. you'll be storing python objects in an array. This way you're going to lose most of the advantages that numpy provides over plain lists.
If what you want is a sentinel value to signal "not available" or "not known", and you can have other values to be floating-point numbers, then you're better off using np.nan
>>> x = np.arange(10, dtype=float)
>>> x[x < 3] = np.nan
>>> x
array([ nan, nan, nan, 3., 4., 5., 6., 7., 8., 9.])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With