Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

map with failure in Numpy

Inspired by Haskell:

How can I implement the following with a numpy array in Python?

In [13]: [(x if x>3 else None) for x in range(10)]
Out[13]: [None, None, None, None, 4, 5, 6, 7, 8, 9]

In other words, I am looking for a function for numpy that would have the signature: f:[a]->(a->Maybe a)->[Maybe a] in Haskell, where [a] would be a numpy list.

I was trying this:

np.apply_along_axis(lambda x:x if x>3 else None,0,np.arange(10))

but it does not work:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
like image 314
jhegedus Avatar asked Jun 16 '26 11:06

jhegedus


2 Answers

NumPy's where() will do the trick:

In [429]: import numpy as np

In [430]: arr = np.arange(10, dtype=np.object)

In [431]: np.where(arr > 3, arr, None)
Out[431]: array([None, None, None, None, 4, 5, 6, 7, 8, 9], dtype=object)

The code above creates a new array. If you wish to modify arr in place, you could use boolean indexing arr[arr < 4] = None (as pointed out by @Chris Mueller) or putmask():

In [432]: np.putmask(arr, arr < 4, None)

In [433]: arr
Out[433]: array([None, None, None, None, 4, 5, 6, 7, 8, 9], dtype=object)

Unless you are constrained to use None as a "flag" value, I would suggest you to stick to @ev-br's recommendation and use np.nan instead. I will follow that approach to assess performance:

In [434]: arr = np.arange(1000000, dtype=np.float)

In [435]: timeit np.where(arr > 3, arr, np.nan)
100 loops, best of 3: 3.61 ms per loop

In [436]: timeit arr[arr < 4] = np.nan
1000 loops, best of 3: 564 µs per loop

In [437]: timeit np.putmask(arr, arr < 4, np.nan)
1000 loops, best of 3: 1.08 ms per loop

Notice that I used a much larger array to further highlight efficiency differences. And the winner is... boolean indexing.

like image 81
Tonechas Avatar answered Jun 18 '26 02:06

Tonechas


I'd recommend to revisit the premise that you want Nones in a numpy array: you need an array of dtype=object to store a None, i.e. you'll be storing python objects in an array. This way you're going to lose most of the advantages that numpy provides over plain lists.

If what you want is a sentinel value to signal "not available" or "not known", and you can have other values to be floating-point numbers, then you're better off using np.nan

>>> x = np.arange(10, dtype=float)
>>> x[x < 3] = np.nan
>>> x
array([ nan,  nan,  nan,   3.,   4.,   5.,   6.,   7.,   8.,   9.])
like image 27
ev-br Avatar answered Jun 18 '26 00:06

ev-br



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!