Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter a numpy.ndarray by date?

I have a 2d numpy.array, where the first column contains datetime.datetime objects, and the second column integers:

A = array([[2002-03-14 19:57:38, 197],
       [2002-03-17 16:31:33, 237],
       [2002-03-17 16:47:18, 238],
       [2002-03-17 18:29:31, 239],
       [2002-03-17 20:10:11, 240],
       [2002-03-18 16:18:08, 252],
       [2002-03-23 23:44:38, 327],
       [2002-03-24 09:52:26, 334],
       [2002-03-25 16:04:21, 352],
       [2002-03-25 18:53:48, 353]], dtype=object)

What I would like to do is select all rows for a specific date, something like

A[first_column.date()==datetime.date(2002,3,17)]
array([[2002-03-17 16:31:33, 237],
           [2002-03-17 16:47:18, 238],
           [2002-03-17 18:29:31, 239],
           [2002-03-17 20:10:11, 240]], dtype=object)

How can I achieve this?

Thanks for your insight :)

like image 277
andreas-h Avatar asked Aug 28 '10 21:08

andreas-h


People also ask

How to filter array in numpy?

In NumPy, you filter an array using a boolean index list. A boolean index list is a list of booleans corresponding to indexes in the array. If the value at an index is True that element is contained in the filtered array, if the value at that index is False that element is excluded from the filtered array.

What is datetime64 in numpy?

datetime64() method, we can get the date in a numpy array in a particular format i.e year-month-day by using numpy. datetime64() method. Syntax : numpy.datetime64(date) Return : Return the date in a format 'yyyy-mm-dd'.


1 Answers

You could do this:

from_date=datetime.datetime(2002,3,17,0,0,0)
to_date=from_date+datetime.timedelta(days=1)
idx=(A[:,0]>from_date) & (A[:,0]<=to_date)
print(A[idx])
# array([[2002-03-17 16:31:33, 237],
#        [2002-03-17 16:47:18, 238],
#        [2002-03-17 18:29:31, 239],
#        [2002-03-17 20:10:11, 240]], dtype=object)

A[:,0] is the first column of A.

Unfortunately, comparing A[:,0] with a datetime.date object raises a TypeError. However, comparison with a datetime.datetime object works:

In [63]: A[:,0]>datetime.datetime(2002,3,17,0,0,0)
Out[63]: array([False,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

Also, unfortunately,

datetime.datetime(2002,3,17,0,0,0)<A[:,0]<=datetime.datetime(2002,3,18,0,0,0)

raises a TypeError too, since this calls datetime.datetime's __lt__ method instead of the numpy array's __lt__ method. Perhaps this is a bug.

Anyway, it's not hard to work-around; you can say

In [69]: (A[:,0]>datetime.datetime(2002,3,17,0,0,0)) & (A[:,0]<=datetime.datetime(2002,3,18,0,0,0))
Out[69]: array([False,  True,  True,  True,  True, False, False, False, False, False], dtype=bool)

Since this gives you a boolean array, you can use it as a "fancy index" to A, which yields the desired result.

like image 170
unutbu Avatar answered Sep 29 '22 16:09

unutbu