Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"as of" in numpy

I am looking for a way to implement an "as of" operator in numpy. Specifically, if:

  1. t1 is an n-vector of timestamps in a strictly increasing order;
  2. d1 is an n x p matrix of observations, with i-th row corresponding to t1[i];
  3. t2 in an m-vector of timestamps, also in a strictly increasing order;

I need to create an m x p matrix d2, where d2[i] is simply d1[j] for the largest value of j such that t1[j] <= t2[i].

In other words, I need to get the rows of d1 as of the timestamps in t2.

It is easy to write this in pure Python, but I am wondering if there's a way to avoid having interpreted loops (n, m and p are quite large).

The timestamps are datetime.datetime objects. The observations are floating-point values.

edit: For entries where t1[j] <= t2[i] can't be satisfied (i.e. where a timestamp in t2 precedes all timestamps in t1), I would ideally like to get rows of NaNs.

like image 687
NPE Avatar asked May 05 '11 13:05

NPE


People also ask

What is NumPy fancy indexing?

Fancy indexing is conceptually simple: it means passing an array of indices to access multiple array elements at once. For example, consider the following array: import numpy as np rand = np. random. RandomState(42) x = rand.

Is NumPy faster than Python?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.

Is TensorFlow faster than NumPy?

Tensorflow is consistently much slower than Numpy in my tests.

What does N mean in NumPy?

If an array has shape (n,) , that means it's a 1-dimensional array with a length of n along its only dimension. It's not a row vector or a column vector; it doesn't have rows or columns.


1 Answers

Your best choice is numpy.searchsorted():

d1[numpy.searchsorted(t1, t2, side="right") - 1]

This will search the indices where the values of t2 would have to be inserted into t1 to maintain order. The side="right" and - 1 bits are to ensure exactly the specified behaviour.

Edit: To get rows of NaNs where the condition t1[j] <= t2[i] can't be satisfied, you could use

nan_row = numpy.repeat(numpy.nan, d1.shape[1])
d1_nan = numpy.vstack((nan_row, d1))
d2 = d1_nan[numpy.searchsorted(t1, t2, side="right")]
like image 157
Sven Marnach Avatar answered Oct 08 '22 21:10

Sven Marnach