I have an "ideal" formula in which I have a sum of values like
score[i] = SUM(properties[i]) * frequency[i] + recency[i]
with properties
being a vector of values, frequency
and recency
scalar values, taken from a given dataset of N items. While all variables here is numeric and with discrete integer values, the recency
value is a UNIX timestamp in a given time range (like 1 month since now, or 1 week since now, etc. on daily basis).
In the dataset, each item i
has a date value expressed as recency[i]
, and a frequency value frequency[i]
, and the list properties[i]
. All properties of item[i]
are therefore evaluated on each day expressed as recency[i]
in the proposed time range.
According to this formula the recency contribution to the score
value for the item[i]
is a negative contribution: the older is the timestamp the better is the score (hence the +
sign in that formula).
My idea was to use a re-scaler approach in the given range like
scaler = MinMaxScaler(feature_range=(min(recencyVec), max(recencyVec)))
scaler = scaler.fit(values)
normalized = scaler.transform(values)
where recencyVec
collects all recency
vectors for each data point, where min(recencyVec)
is the first day and max(recencyVec)
is the last day.
using the scikit-learn object MinMaxScaler, hence transforming the recency
values by scaling each feature to the given range as suggested in How to Normalize and Standardize Time Series Data in Python
Is this the correct approach for this numerical formulation? Which alternative approach may be possible to normalize the timestamp values when summed to other discrete numeric values?
Is recency
then an absolute UNIX timestamp? Or do you already subtract the current timestamp? If not, then depending on your goal, it might be sufficient to simply subtract the current unix timestamp from recency
, so that it consistently describes "seconds from now", or the time delta instead of absolute unix time.
Of course, that would create quite a large score, but it will be consistent.
What scaling you use depends on your goal (what is an acceptable score
?), but many are valid as long as they're monotonic. In addition to the min-max scaling (where I would suggest using 0
as minimum and set maximum to some known maximum time offset), you might also want to consider the log transformation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With