Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate differences between different times series with non-aligning times using Julia and Dataframes?

I have multiple time series with non-aligning time values. Plotted below. Simple time-pressure data. How would I compute a difference between a pressure in df1 with that in df2 when the rows have slightly different timestamps and the frames have different shapes (df2 has many more rows)?

simple data plot

If the timestamps aligned between dataframes, and the frames had the same number of rows, I would expect this very simple intermediate column would suffice, but clearly in this case it won't. (Each dataset comes from a different device)

wishfull thinking

How do I generate a new pair of time-pressure columns having some sort of fixed/specific time interval and interpolated values from the original time-pressure columns?

Or is there a more sensible/easier way to achieve what I want to do (essentially, I just want to accurately quantify the differences between "208417" and the other series)?

like image 343
byezy Avatar asked Oct 18 '25 21:10

byezy


2 Answers

when the rows have slightly different timestamps a closejoin must used first,e.g.

using InMemoryDatasets

df1=Dataset(Time=[Date("2022-09-28"),Date("2022-10-02"),Date("2022-10-03"),Date("2022-10-04")],Pressure=[10,11,9,15])

df2=Dataset(Time=[Date("2022-10-01"),Date("2022-10-02"),Date("2022-10-05")],Pressure=[7,9,19])

closejoin!(df1,df2,on=[:Time],makeunique=true,border=:missing)

modify!(df1,[:Pressure,:Pressure_1]=>byrow(-)=>:Difference1)

@df df1 plot(:Time,:Difference1,label="Difference1",color=:red)
like image 97
Julia_lover Avatar answered Oct 20 '25 16:10

Julia_lover


The general idea I propose, is to use a neutral regular grid of evaluation points to calculate difference between irregular datasets. In order to evaluate datasets on this grid we use linear interpolation. The comments in the code go through the steps:

using Dates
using DataFrames
using Interpolations

# setup some random data
startdate = DateTime("2022-10-10", "yyyy-mm-dd")
df = DataFrame(
  time = startdate .+ Second.(rand(1:500_000, 20)), 
  pressure = 10 .+ rand(20)
)
df2 = DataFrame(
  time = startdate .+ Second.(rand(1:500_000, 20)), 
  pressure = 9 .+ rand(20)
)
# sort data by timestamp needed for interpolation
sort!(df, order(:time))
sort!(df2, order(:time))
# create numeric time value for interpolation
# TODO: suggestions for better method welcome!
time2val(x) = x.instant.periods.value
df.time2 = time2val.(df.time)/1000.0;
df2.time2 = time2val.(df2.time)/1000.0;

# create interpolation objects
itp = interpolate((df.time2,), df.pressure, Gridded(Linear()));
itp2 = interpolate((df2.time2,), df2.pressure, Gridded(Linear()));

# calculate a grid to evaluate difference on
N = 2*(nrow(df)+nrow(df2))
mintime = max(minimum(df2.time2), minimum(df.time2))
maxtime = min(maximum(df2.time2), maximum(df.time2))
interprange = range(mintime, maxtime, N)

# calculate average difference on interpolated grid
sum(itp[t]-itp2[t] for t in interprange)/N

The result is ~1.12 which stems from the data generation difference between 9 and 10 offsets on random pressures.

like image 29
Dan Getz Avatar answered Oct 20 '25 17:10

Dan Getz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!