Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decide on weights?

Tags:

algorithm

For my work, I need some kind of algorithm with the following input and output:

Input: a set of dates (from the past). Output: a set of weights - one weight per one given date (the sum of all weights = 1).

The basic idea is that the closest date to today's date should receive the highest weight, the second closest date will get the second highest weight, and so on...

Any ideas?

Thanks in advance!

like image 898
Sash Avatar asked Dec 16 '22 10:12

Sash


2 Answers

First, for each date in your input set assign the amount of time between the date and today.

For example: the following date set {today, tomorrow, yesterday, a week from today} becomes {0, 1, 1, 7}. Formally: val[i] = abs(today - date[i]).

Second, inverse the values in such a way that their relative weights are reversed. The simplest way of doing so would be: val[i] = 1/val[i].

Other suggestions:

  • val[i] = 1/val[i]^2
  • val[i] = 1/sqrt(val[i])
  • val[i] = 1/log(val[i])

The hardest and most important part is deciding how to inverse the values. Think, what should be the nature of the weights? (do you want noticeable differences between two far away dates, or maybe two far away dates should have pretty equal weights? Do you want a date which is very close to today have an extremely bigger weight or a reasonably bigger weight?).

Note that you should come up with an inverting procedure where you cannot divide by zero. In the example above, dividing by val[i] results in division by zero. One method to avoid division by zero is called smoothing. The most trivial way to "smooth" your data is using the add-one smoothing where you just add one to each value (so today becomes 1, tomorrow becomes 2, next week becomes 8, etc).

Now the easiest part is to normalize the values so that they'll sum up to one.

sum = val[1] + val[2] + ... + val[n]
weight[i] = val[i]/sum for each i
like image 118
snakile Avatar answered Jan 15 '23 10:01

snakile


  • Sort dates and remove dups
  • Assign values (maybe starting from the farthest date in steps of 10 or whatever you need - these value can be arbitrary, they just reflect order and distance)
  • Normalize weights to add up to 1

Executable pseudocode (tweakable):

#!/usr/bin/env python

import random, pprint
from operator import itemgetter

# for simplicity's sake dates are integers here ...
pivot_date = 1000
past_dates = set(random.sample(range(1, pivot_date), 5))

weights, stepping = [], 10

for date in sorted(past_dates):
    weights.append( (date, stepping) )
    stepping += 10

sum_of_steppings = sum([ itemgetter(1)(x) for x in weights ])
normalized = [ (d, (w / float(sum_of_steppings)) ) for d, w in weights ]

pprint.pprint(normalized)

# Example output
# The 'date' closest to 1000 (here: 889) has the highest weight, 
# 703 the second highest, and so forth ...
# [(151, 0.06666666666666667),
#  (425, 0.13333333333333333),
#  (571, 0.2),
#  (703, 0.26666666666666666),
#  (889, 0.3333333333333333)]
like image 41
miku Avatar answered Jan 15 '23 08:01

miku