Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

element wise test of numpy array is numeric

I have an array as following:

In [1]: x = array(['1.2', '2.3', '1.2.3'])

I want to test if each element in the array can be converted into numerical value. That is, a function: is_numeric(x) will return a True/False array as following:

In [2]: is_numeric(x)
Out[2]: array([True, True, False])

How to do this?

like image 799
Wei Li Avatar asked Jun 23 '16 15:06

Wei Li


People also ask

How do you check if a number is in a NumPy array?

Using Numpy array, we can easily find whether specific values are present or not. For this purpose, we use the “in” operator. “in” operator is used to check whether certain element and values are present in a given sequence and hence return Boolean values 'True” and “False“.

How do you check if an array is a number in Python?

isnumeric() function returns True if there are only numeric characters in the element. Input array. Return value: out : ndarray, bool - Array of booleans of same shape as a.

How do you check if something is a number in Python?

The isnumeric() method returns True if all the characters are numeric (0-9), otherwise False. Exponents, like ² and ¾ are also considered to be numeric values.

How do you find the number of elements in a NumPy array?

Use bincount() to count True elements in a NumPy array In python, the numpy module provides a function bincount(arr), which returns a count of number of occurrences of each value in array of non-negative ints.


2 Answers

I find the following works well for my purpose.

First, save the isNumeric function from https://rosettacode.org/wiki/Determine_if_a_string_is_numeric#C in a file called ctest.h, then create a .pyx file as follows:

from numpy cimport ndarray, uint8_t
import numpy as np
cimport numpy as np

cdef extern from "ctest.h":
     int isNumeric(const char * s)

def is_numeric_elementwise(ndarray x):
    cdef Py_ssize_t i
    cdef ndarray[uint8_t, mode='c', cast=True] y = np.empty_like(x, dtype=np.uint8)

    for i in range(x.size):
        y[i] = isNumeric(x[i])

    return y > 0

The above Cython function runs quite fast.

In [4]: is_numeric_elementwise(array(['1.2', '2.3', '1.2.3']))
Out[4]: array([ True,  True, False], dtype=bool)

In [5]: %timeit is_numeric_elementwise(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 695 ms per loop

Compare with is_numeric_3 method in https://stackoverflow.com/a/37997673/4909242, it is ~5 times faster.

In [6]: %timeit is_numeric_3(array(['1.2', '2.3', '1.2.3'] * 1000000))
1 loops, best of 3: 3.45 s per loop

There might still be some rooms to improve, I guess.

like image 124
Wei Li Avatar answered Sep 18 '22 13:09

Wei Li


import numpy as np

def is_float(val):
        try:
            float(val)
        except ValueError:
            return False
        else:
            return True

a = np.array(['1.2', '2.3', '1.2.3'])

is_numeric_1 = lambda x: map(is_float, x)              # return python list
is_numeric_2 = lambda x: np.array(map(is_float, x))    # return numpy array
is_numeric_3 = np.vectorize(is_float, otypes = [bool]) # return numpy array

Depend on the size of a array and the type of the returned values, these functions have different speed.

In [26]: %timeit is_numeric_1(a)
100000 loops, best of 3: 2.34 µs per loop

In [27]: %timeit is_numeric_2(a)
100000 loops, best of 3: 3.13 µs per loop

In [28]: %timeit is_numeric_3(a)
100000 loops, best of 3: 6.7 µs per loop

In [29]: a = np.array(['1.2', '2.3', '1.2.3']*1000)

In [30]: %timeit is_numeric_1(a)
1000 loops, best of 3: 1.53 ms per loop

In [31]: %timeit is_numeric_2(a)
1000 loops, best of 3: 1.6 ms per loop

In [32]: %timeit is_numeric_3(a)
1000 loops, best of 3: 1.58 ms per loop

If list is okay, use is_numeric_1.

If you want a numpy array, and size of a is small, use is_numeric_2.

Else, use is_numeric_3

like image 33
dragon2fly Avatar answered Sep 19 '22 13:09

dragon2fly