Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to normalize unix timestamps to sum to discrete numeric values?

I have an "ideal" formula in which I have a sum of values like

score[i] = SUM(properties[i]) * frequency[i] + recency[i]

with properties being a vector of values, frequency and recency scalar values, taken from a given dataset of N items. While all variables here is numeric and with discrete integer values, the recency value is a UNIX timestamp in a given time range (like 1 month since now, or 1 week since now, etc. on daily basis).

In the dataset, each item i has a date value expressed as recency[i], and a frequency value frequency[i], and the list properties[i]. All properties of item[i] are therefore evaluated on each day expressed as recency[i] in the proposed time range.

According to this formula the recency contribution to the score value for the item[i] is a negative contribution: the older is the timestamp the better is the score (hence the + sign in that formula).

My idea was to use a re-scaler approach in the given range like

scaler = MinMaxScaler(feature_range=(min(recencyVec), max(recencyVec)))
scaler = scaler.fit(values)
normalized = scaler.transform(values)

where recencyVec collects all recency vectors for each data point, where min(recencyVec) is the first day and max(recencyVec) is the last day.

using the scikit-learn object MinMaxScaler, hence transforming the recency values by scaling each feature to the given range as suggested in How to Normalize and Standardize Time Series Data in Python

Is this the correct approach for this numerical formulation? Which alternative approach may be possible to normalize the timestamp values when summed to other discrete numeric values?

like image 613
loretoparisi Avatar asked Oct 29 '21 16:10

loretoparisi


Video Answer


1 Answers

Is recency then an absolute UNIX timestamp? Or do you already subtract the current timestamp? If not, then depending on your goal, it might be sufficient to simply subtract the current unix timestamp from recency, so that it consistently describes "seconds from now", or the time delta instead of absolute unix time. Of course, that would create quite a large score, but it will be consistent.

What scaling you use depends on your goal (what is an acceptable score?), but many are valid as long as they're monotonic. In addition to the min-max scaling (where I would suggest using 0 as minimum and set maximum to some known maximum time offset), you might also want to consider the log transformation.

like image 177
hyit Avatar answered Oct 21 '22 07:10

hyit