Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Define an algorithm which gets a number and a list and returns a scalar based on number's distance to average of the list

Let's suppose that we have got a list which appends an integer in each iteration which is between 15, 32(let's call the integer rand). I want to design an algorithm which assigns a reward around 1 (between 1.25 and 0.75) to each rand. the rule for assigning the reward goes like this.

first we calculate the average of the list. Then if rand is more than average, we expect the reward to be less than 1, and if rand is less than average, the reward gets higher than 1. The more distance between average and rand, the more reward increases/decreases. for example:

rand = 15, avg = 23 then reward = 1.25

rand = 32, avg = 23 then reward = 0.75

rand = 23, avg = 23 then reward = 1 and so on.

I had developed the code below for this algorithm:

import numpy as np

rollouts = np.array([])
i = 0

def modify_reward(lst, rand):
  reward = 1
  constant1 = 0.25
  constant2 = 1

  std = np.std(lst)
  global avg
  avg = np.mean(lst)
  sub = np.subtract(avg, rand)
  landa = sub / std if std != 0 else 0
  coefficient =  -1 + ( 2 / (1 + np.exp(-constant2 * landa)))
  md_reward = reward + (reward * constant1 * coefficient)
  return md_reward

while i < 100:
  rand = np.random.randint(15, 33)
  rollouts = np.append(rollouts, rand)
  modified_reward = modify_reward(rollouts, rand)
  i += 1
  print([i,rand, avg, modified_reward])

# test the reward for upper bound and lower bound

rand1, rand2 = 15, 32
reward1, reward2 = modify_reward(rollouts, rand1), modify_reward(rollouts, rand2)

print(['reward for upper bound', rand1, avg, reward1])
print(['reward for lower bound', rand2, avg, reward2])

The algorithm works quite fine, but if you look at examples below, you would notice the problem with algorithm.

rand = 15, avg = 23.94 then reward = 1.17 # which has to be 1.25

rand = 32, avg = 23.94 then reward = 0.84 # which has to be 0.75

rand = 15, avg = 27.38 then reward = 1.15 # which has to be 1.25

rand = 32, avg = 27.38 then reward = 0.93 # which has to be 0.75

As you might have noticed, Algorithm doesn't consider the distance between avg and bounds (15, 32). The more avg moves towards lower bound or higher bound, the more modified_reward gets unbalanced.

I need modified_reward to be uniformly assigned, no matter avg moves toward upper bound or lower bound. Can anyone suggest some modification to this algorithm which could consider the distance between avg and bounds of the list.

like image 308
Masoud Masoumi Moghadam Avatar asked Sep 29 '17 02:09

Masoud Masoumi Moghadam


3 Answers

Putting together these two requirements:

if rand is more than average, we expect the reward to be less than 1, and if rand is less than average, the reward gets higher than 1.

I need modified_reward to be uniformly assigned, no matter avg moves toward upper bound or lower bound.

is slightly tricky, depending on what you mean by 'uniformly'.

If you want 15 to always be rewarded with 1.25, and 32 to always be rewarded with 0.75, you can't have a single linear relationship while also respecting your first requirement.

If you are happy with two linear relationships, you can aim for a situation where modified_reward depends on rand like this:

enter image description here

which I produced with this Wolfram Alpha query. As you can see, this is two linear relationships, with a 'knee' at avg. I expect you'll be able to derive the formulae for each part without too much trouble.

like image 98
AakashM Avatar answered Nov 18 '22 21:11

AakashM


This code implements a linear distribution of weights proportional to the distance from average towards your given limits.

import numpy as np

class Rewarder(object):
    lo = 15
    hi = 32
    weight = 0.25

    def __init__(self):
        self.lst = np.array([])

    def append(self, x):
        self.lst = np.append(self.lst, [x])

    def average(self):
        return np.mean(self.lst)

    def distribution(self, a, x, b):
        '''
            Return a number between 0 and 1 proportional to 
            the distance of x from a towards b.
            Note: Modify this fraction if you want a normal distribution
            or quadratic etc.
        '''
        return (x - a) / (b - a)

    def reward(self, x):
        avg = self.average()

        if x > avg :
            w = self.distribution(avg, x, self.hi)
        else:
            w = - self.distribution(avg, x, self.lo)

        return 1 - self.weight * w

rollouts = Rewarder()
rollouts.append(23)
print rollouts.reward(15)
print rollouts.reward(32)
print rollouts.reward(23)

Producing:

1.25
0.75
1.0

The code in your question seems to be using np.std which I presume is an attempt to get a normal distribution. Remember that the normal distribution never actually gets to zero.

If you tell me what shape you want for the distribution we can modify Rewarder.distribution to suit.

Edit:

I can't access the paper you refer to but infer that you want a sigmoid style distribution of rewards giving a 0 at mean and approximately +/-0.25 at min and max. Using the error function as the weighting if we scale by 2 we get approximately 0.995 at min and max.

Override the Rewarder.distribution:

import math

class RewarderERF(Rewarder):

    def distribution(self, a, x, b):
        """
            Return an Error Function (sigmoid) weigthing of the distance from a.
            Note: scaled to reduce error at max to ~0.003
            ref: https://en.wikipedia.org/wiki/Sigmoid_function
        """
        return math.erf(2.0 * super(RewarderERF, self).distribution(a, x, b))


rollouts = RewarderERF()
rollouts.append(23)
print rollouts.reward(15)
print rollouts.reward(32)
print rollouts.reward(23)

results in:

1.24878131454
0.75121868546
1.0

You can choose which error function suits your application and how much error you can accept at min and max. I'd also expect that you'd integrate all these functions into your class, I've split everything out so we can see the parts.

Regarding the calculating the mean, do you need to keep the list of values and recalculate each time or can you keep a count and running total of the sum? Then you would not need numpy for this calculation.

like image 5
Mike Robins Avatar answered Nov 18 '22 23:11

Mike Robins


I don't understand why you are calculating md_reward like this. Please provide logic and reason. But

landa = sub / std if std != 0 else 0
coefficient =  -1 + ( 2 / (1 + np.exp(-constant2 * landa)))
md_reward = reward + (reward * constant1 * coefficient)

will not give what you are looking for. Because lets consider below cases

for md_reward to be .75 
-->  coefficient should be -1
    -->  landa == -infinite (negative large value, i.e. , rand should be much larger than 32)

for md_reward to be 1
-->  coefficient should be 0
    -->  landa == 0 (std == 0 or sub == 0) # which is possible

for md_reward to be 1.25 
-->  coefficient should be 1
    -->  landa == infinite (positive large value, i.e. , rand should be much smaller than 15)

If you want to normalize reward from avg to max and avg to min. check below links. https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range https://stats.stackexchange.com/questions/70553/what-does-normalization-mean-and-how-to-verify-that-a-sample-or-a-distribution

Now modify your function with something below.

def modify_reward(lst, rand):
    reward = 1
    constant1 = 0.25
    min_value = 15
    max_value = 32
    avg = np.mean(lst)
    if rand >= avg:
        md_reward = reward - constant1*(rand - avg)/(max_value - avg) # normalize rand from avg to max
    else:
        md_reward = reward + constant1*(1 - (rand - min_value)/(avg - min_value)) # normalize rand from min to avg
    return md_reward

I have used below method

Normalized:
(X−min(X))/(max(X)−min(X))

for case rand >= avg

min(X) will be avg and max(X) is max_value

and for case rand < avg

min(X) in min_value and max(X) is avg

Hope this helps.

like image 2
surya singh Avatar answered Nov 18 '22 22:11

surya singh