Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating an evenly sampled array from unevenly sampled data in NumPy

The problem is that I want to reduce the amount of data for plots and analysis. I'm using Python and Numpy. The data is unevenly sampled, so there is an array of timestamps and an array of corresponding values. I want it to be at least a certain amount of time between the datapoints. I have a simple solution here written in Python, where the indicies are found where there is at least one second between the samples:

import numpy as np

t = np.array([0, 0.1, 0.2, 0.3, 1.0, 2.0, 4.0, 4.1, 4.3, 5.0 ]) # seconds
v = np.array([0, 0.0, 2.0, 2.0, 2.0, 4.0, 4.0, 5.0, 5.0, 5.0 ])

idx = [0]
last_t = t[0]
min_dif = 1.0 # Minimum distance between samples in time
for i in range(1, len(t)):
    if last_t + min_dif <= t[i]:
        last_t = t[i]
        idx.append(i)

If we look at the result:

--> print idx
[0, 4, 5, 6, 9]

--> print t[idx]
[ 0.  1.  2.  4.  5.]

The question is how can this be done more effectively, especially if the arrays are really long? Are there some built in NumPy or SciPy methods that do something similar?

like image 448
J. P. Petersen Avatar asked Aug 21 '12 13:08

J. P. Petersen


2 Answers

While, like @1443118, I'd suggest to use pandas, you may want to try something with np.histogram.

First, get an idea of the number of bins (intervals of min_dif s) you would need:

>>> bins = np.arange(t[0], t[-1]+min_dif, min_dif) - 1e-12

The t[-1]+min_dif is to ensure we take the last point, the -1e-12 a hack to avoid having the 4.0 of your example counted in the last bin: it's just an offset to make sure we close the intervals on the right.

>>> (counts, _) = np.histogram(t, bins)
>>> counts
array([4, 1, 1, 0, 3])
>>> counts.cumsum()
array([4, 5, 6, 6, 9])

So, v[0:4] is your first sample, v[4:5] your second... you get the idea.

like image 61
Pierre GM Avatar answered Nov 14 '22 22:11

Pierre GM


A simple solution would be by interpolation, using e.g. numpy.interp:

vsampled = numpy.interp(numpy.arange(t[0], t[-1]), t, v)

This will not give you the indices of the values though. However, it will generate values by interpolation even for points in t where no data in the input arrays is available.

like image 43
silvado Avatar answered Nov 14 '22 22:11

silvado