I have an array as following:
In [1]: x = array(['1.2', '2.3', '1.2.3'])
I want to test if each element in the array can be converted into numerical value. That is, a function: is_numeric(x) will return a True/False array as following:
In [2]: is_numeric(x)
Out[2]: array([True, True, False])
How to do this?
Using Numpy array, we can easily find whether specific values are present or not. For this purpose, we use the “in” operator. “in” operator is used to check whether certain element and values are present in a given sequence and hence return Boolean values 'True” and “False“.
isnumeric() function returns True if there are only numeric characters in the element. Input array. Return value: out : ndarray, bool - Array of booleans of same shape as a.
The isnumeric() method returns True if all the characters are numeric (0-9), otherwise False. Exponents, like ² and ¾ are also considered to be numeric values.
Use bincount() to count True elements in a NumPy array In python, the numpy module provides a function bincount(arr), which returns a count of number of occurrences of each value in array of non-negative ints.
I find the following works well for my purpose.
First, save the isNumeric function from https://rosettacode.org/wiki/Determine_if_a_string_is_numeric#C in a file called ctest.h, then create a .pyx file as follows:
from numpy cimport ndarray, uint8_t
import numpy as np
cimport numpy as np
cdef extern from "ctest.h":
int isNumeric(const char * s)
def is_numeric_elementwise(ndarray x):
cdef Py_ssize_t i
cdef ndarray[uint8_t, mode='c', cast=True] y = np.empty_like(x, dtype=np.uint8)
for i in range(x.size):
y[i] = isNumeric(x[i])
return y > 0
The above Cython function runs quite fast.
In [4]: is_numeric_elementwise(array(['1.2', '2.3', '1.2.3']))
Out[4]: array([ True, True, False], dtype=bool)
In [5]: %timeit is_numeric_elementwise(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 695 ms per loop
Compare with is_numeric_3 method in https://stackoverflow.com/a/37997673/4909242, it is ~5 times faster.
In [6]: %timeit is_numeric_3(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 3.45 s per loop
There might still be some rooms to improve, I guess.
import numpy as np
def is_float(val):
try:
float(val)
except ValueError:
return False
else:
return True
a = np.array(['1.2', '2.3', '1.2.3'])
is_numeric_1 = lambda x: map(is_float, x) # return python list
is_numeric_2 = lambda x: np.array(map(is_float, x)) # return numpy array
is_numeric_3 = np.vectorize(is_float, otypes = [bool]) # return numpy array
Depend on the size of a array and the type of the returned values, these functions have different speed.
In [26]: %timeit is_numeric_1(a)
100000 loops, best of 3: 2.34 µs per loop
In [27]: %timeit is_numeric_2(a)
100000 loops, best of 3: 3.13 µs per loop
In [28]: %timeit is_numeric_3(a)
100000 loops, best of 3: 6.7 µs per loop
In [29]: a = np.array(['1.2', '2.3', '1.2.3']*1000)
In [30]: %timeit is_numeric_1(a)
1000 loops, best of 3: 1.53 ms per loop
In [31]: %timeit is_numeric_2(a)
1000 loops, best of 3: 1.6 ms per loop
In [32]: %timeit is_numeric_3(a)
1000 loops, best of 3: 1.58 ms per loop
If list
is okay, use is_numeric_1
.
If you want a numpy array
, and size of a is small, use is_numeric_2
.
Else, use is_numeric_3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With