I saw someone mention the reword
function today, but documentation for it is very brief. It looks like shell script environment variable substitution, or maybe regex substitution, but different. How do I use this function and what kind of gotchas am I going to run into?
Simply speaking, a reward function is a function that provides a numerical score based on the state of the environment. A reward function is a mapping of each perceived state (or state-action pair) of the environment to a single number, specifying the intrinsic desirability of that state.
While studying Reinforcement Learning, I have come across many forms of the reward function: R(s,a), R(s,a,s′), and even a reward function that only depends on the current state.
Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL). Existing approaches such as potential-based reward shaping normally make full use of a given shaping reward function.
To answer the original question of "how to properly reward the NN": Generate many trajectories of your robots movement in the state space. In your example, one trajectory is ends either when the agent reaches the goal, or if he took more than 50 steps (taking too long).
The reword
function is a bit of an experiment to add shell-style string interpolation to Rebol in a way that works with the way we do things. Unlike a lot of Rebol's series functions, it really is optimized for working on just string types, and the design reflects that. The current version is a design prototype, meant to eventually be redone as a native, but it does work as designed so it makes sense to talk about how it works and how to use it.
reword
do?Basically this:
>> reword "$a is $b." [a "This" b "that"]
== "This is that."
It takes a template string, it searches for the escape sequences, and replaces those with the corresponding substitution values. The values are passed to the function as well, as an object, a map, or a block of keys and values. The keys can be pretty much anything, even numbers:
>> reword "$1 is $2." [1 "This" 2 "that"]
== "This is that."
The keys are converted to strings if they aren't strings already. Keys are considered to be the same if they would be converted to the same string, which is what happens when you do something like this:
>> reword "A $a is $a." [a "fox" "a" "brown"]
== "A brown is brown."
It's not positional like regex replacement, it's keyword based. If you have a key that is specified more than once in the values block, the last value for that key is the one that gets used, as we just saw. Any unset or none values are just skipped, since those have no meaning when putting stuff into a string.
You can use other escape flags too, even multi-character ones:
>> reword/escape "A %%a is %%b." [a "fox" b "brown"] "%%"
== "A fox is brown."
Or even have no escape flag at all, and it will replace the key everywhere:
>> reword/escape "I am answering you." [I "Brian" am "is" you "Adrian"] none
== "Brian is answerBrianng Adrian."
Whoops, that didn't work. This is because the keys aren't case-sensitive, and they don't need to be surrounded by spaces or other such delimiters. But, you can put spaces in the keys themselves if you specify them as strings, so this works better:
>> reword/escape "I am answering you." ["I am" "Brian is" you "Adrian"] none
== "Brian is answering Adrian."
Still, doing reword
templates without escape characters tends to be tricky and a little bit slower, so it's not done as often.
There's an even better trick though...
Where reword
gets really interesting is when you use a function as a replacement value, since that function gets called with every rewording. Say, you wanted to replace with a counter:
>> reword "$x then $x and $x, also $x" object [y: 1 x: does [++ y]]
== "1 then 2 and 3, also 4"
Or maybe even the position, since it can take the string position as a parameter:
>> reword "$x then $x and $x, also $x" object [x: func [s] [index? s]]
== "1 then 9 and 16, also 25"
Wait, that doesn't look right, those numbers seem off. That is because the function is returning the indexes of the template string, not the result string. Good to keep that in mind when writing these functions. The function doesn't even have to just be assigned to one key, it can detect or use it:
>> reword "$x or $y" object [x: y: func [s] [ajoin ["(it's " copy/part s 2 ")"]]]
== "(it's $x) or (it's $y)"
See, template variables, escapes and all. And the function can have side effects, like this line counter:
>> reword/escape "Hello^/There^/nl" use [x] [x: 0 map reduce ["^/" does [++ x "^/"] "nl" does [x]]] ""
== "Hello^/There^/2"
It even comes with the /into
option, so you can use it to build strings in stages.
But the big gotcha for someone coming from a language with interpolation build in, is...
Because Rebol just doesn't work that way. Rebol doesn't have lexical binding, it does something else, so in a string there is just no way to know where to get the values of variables from without saying so. In one of those shell languages that has interpolation, it would be the equivalent to having to pass a reference to the environment as a whole to the interpolation function. But hey, we can do just that in Rebol:
>> use [x] [x: func [s] [index? s] reword "$x then $x and $x, also $x" bind? 'x]
== "1 then 9 and 16, also 25"
That bind?
method will work in use
, binding loops and functions. If you are in an object, you can also use self
:
>> o: object [x: func [s] [index? s] y: func [s] [reword s self]]
== make object! [
x: make function! [[s][index? s]]
y: make function! [[s][reword s self]]
]
>> o/y "$x then $x and $x, also $x"
== "1 then 9 and 16, also 25"
But be careful, or you can end up doing something like this:
>> o/y "$x then $x and $x, also $x, finally $y"
** Internal error: stack overflow
Dragons! That's one good reason to keep your variables and your replacement keys separate...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With