Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any example code of REINFORCE algorithm proposed by Williams?

Does any one know any example code of an algorithm Ronald J. Williams proposed in
A class of gradient-estimating algorithms for reinforcement learning in neural networks

like image 634
Alex Gao Avatar asked Feb 11 '15 15:02

Alex Gao


1 Answers

Yes, do a search on GitHub, and you will get a whole bunch of results:

GitHub: WILLIAMS+REINFORCE

The most popular ones use this code (in Python):

__author__ = 'Thomas Rueckstiess, [email protected]'

from pybrain.rl.learners.directsearch.policygradient import PolicyGradientLearner
from scipy import mean, ravel, array


class Reinforce(PolicyGradientLearner):
""" Reinforce is a gradient estimator technique by Williams (see
    "Simple Statistical Gradient-Following Algorithms for
    Connectionist Reinforcement Learning"). It uses optimal
    baselines and calculates the gradient with the log likelihoods
    of the taken actions. """

def calculateGradient(self):
    # normalize rewards
    # self.ds.data['reward'] /= max(ravel(abs(self.ds.data['reward'])))

    # initialize variables
    returns = self.dataset.getSumOverSequences('reward')
    seqidx = ravel(self.dataset['sequence_index'])

    # sum of sequences up to n-1
    loglhs = [sum(self.loglh['loglh'][seqidx[n]:seqidx[n + 1], :]) for n in range(self.dataset.getNumSequences() - 1)]
    # append sum of last sequence as well
    loglhs.append(sum(self.loglh['loglh'][seqidx[-1]:, :]))
    loglhs = array(loglhs)

    baselines = mean(loglhs ** 2 * returns, 0) / mean(loglhs ** 2, 0)
    # TODO: why gradient negative?
    gradient = -mean(loglhs * (returns - baselines), 0)

    return gradient
like image 105
dberm22 Avatar answered Oct 27 '22 00:10

dberm22