Echo State Network learning Mackey-Glass function, but how?

Question

I got this example of a minimal Echo State Network (ESN) which I analyse while trying to understand Echo State Networks. Unfortunately I have some problems understanding why this really works. It all breaks down to the questions:

[ What defines | What is] the echo state of an ESN?
What is it that makes an ESN so easy and fast learning of such complex nonlinear functions like the Mackey-Glass function?

First here is a little piece of code that shows the important part of initialization:

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Generate the ESN reservoir
% 
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

rand('seed', 42);

trainLen = 2000;
testLen  = 2000;
initLen  = 100;
data     = load('MackeyGlass_t17.txt');

%         Input neurons
inSize  = 1; 
%         Output neurons 
outSize = 1;
%         Reservoir size
resSize = 1000;
%         Leaking rate
a       = 0.3; 
%         Input weights
Win     = ( rand(resSize, (inSize+1) ) - 0.5) .* 1;
%         Reservoir weights
W       = rand(resSize, resSize) - 0.5;

Running the reservoir:

I understand that every single data-point of the input data set is propagated from the input neuron to the reservoir neurons. After a warm-up of size initLen the states are accepted and stored in matrix X. When this is done every single column of X represents a "vector of reservoir neuron activations". And here comes the point where I am not sure if I got it right:

The comment already says "collected states" or "design matrix" X. Am I getting this right, that all this does is storing the state of the whole network in the rows of matrix X?

If we assume that t was just a time parameter then X(:,t) represents the network state of time t , isn't it?

In my examples this would mean that there are 1.900 time slices which represent the whole network state of their corresponding timeframe (X therefore is a 1002x1900 matrix). Another question that occurs to me here is

why is a 1 (I guess it is the bias) and the input value u appended to this vector: X(:,t-initLen) = [1;u;x];

So:

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 
% Run the reservoir with the data and collect X.
% 
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%       Allocated memory for the design (collected states) matrix
X     = zeros((1+inSize) + resSize, trainLen - initLen);

%       Vector of reservoir neuron activations (used for calculation)
x     = zeros(resSize, 1);

%       Update of the reservoir neuron activations
xUpd  = zeros(resSize, 1);

for t = 1:trainLen
    
    u    = data(t);
    
    xUpd = tanh( Win * [1;u] + W * x );    
    x    = (1-a) * x + a * xUpd;
    
    if ( t > initLen )
        X(:,t-initLen) = [1;u;x];
    end
    
end

Training part:

The training part is also a little magic to me yet. I am familiar how linear regression works, so this is not the problem here.

What I see is that this part just uses the hole state matrix X and performs a single linear regression step on the input data to generate the output weight vector Wout and that's it.

So all that's been done so far - if I'm not mistaken - is initializing the output weights according to the state matri X which itself was generated using input data and randomly gernerated (input and reservoir) weights.

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 
% Train the output
% 
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%       Set the corresponding target matrix directly
Yt    = data(initLen+2:trainLen+1)';

%       Regularization coefficient
reg   = 1e-8;  

%       Get X transposed - needed twice therefore it is a little faster
X_T   = X';

%       Yt * pseudo_inverse(X); (linear regression task)
Wout  = Yt * X_T * (X * X_T + reg * eye(1+inSize+resSize))^(-1);

Running the ESN in a generative mode:

I can run this in two modes: generative or predictive. But well, this is the part where I just can say: "Well, .. it works." not having the exact idea why it is.

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 
% Run the trained ESN in a generative mode. no need to initialize here, 
% because x is initialized with training data and we continue from there.
% 
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Y = zeros(outSize,testLen);
u = data(trainLen+1);

for t = 1:testLen 
    
    xUpd   = tanh( Win*[1;u] + W*x );
    x      = (1-a)*x + a*xUpd;
    
    %        Generative mode:
    u      = Wout*[1;u;x];

    %      This would be a predictive mode:
    %u      = data(trainLen+t+1);

    Y(:,t) = u;
    
end

It works pretty well as you can see (generative mode):

enter image description here

I know this is a quiet huge "question" if this can even be considered as one. I feel like I am understanding the single parts but what I'm missing is the big picture of this magic black box called Echo State Network.

kostas · Accepted Answer

The echo state network (ESN) is basically a clever way to train a Recurrent Neural Network. The ESN has a "reservoir" of hidden units which are coupled. The inputs are connected to the reservoir with input (plus a bias) to hidden connections. These connections are not trained. They are randomly initialized, and this is the code snippet that does this initialization (I am using python).

Win = (random.rand(resSize,1+inSize)-0.5) * 1

The units in the reservoir are coupled, meaning basically that there exist hidden to hidden connections. Again the weights in the reservoir are not trained but initialized. However, initialization of the reservoir weights is tricky. Those weights (depicted by W in the code) are first randomly initialized and then they are multiplied by a factor which takes into account the spectral radius of the random matrix. Careful initialization of these connections is very important because it affects the dynamics of the ESN (do not forget it is a recurrent network). I guess if you want to know more details about this you have to be able to understand linear system theory. Now, after initializing properly the two weight matrices you start presenting inputs to the reservoir. For each input presented to the reservoir the activations are calculated and these activations are the state of the ESN. Look at the figure below. enter image description here

This figure shows a plot of 200 activations for 20 inputs. So, after presenting all inputs to the ESN the states are collected into a matrix X. This is the code snippet that does this in python:

x = zeros((resSize,1))
for t in range(trainLen):
    u = data[t]
    x = (1-a)*x + a*tanh( dot( Win, vstack((1,u)) ) + dot( W, x ) )
    if t >= initLen:
        X[:,t-initLen] = vstack((1,u,x))[:,0]

The state of the ESN is therefore a function of the finite history of the inputs presented to the network. Now, in order to predict the output from the states of the oscillators the only thing that has to be learned is how to couple the outputs to the oscillators, i.e. the hidden to output connections:

# train the output
reg = 1e-8  # regularization coefficient
X_T = X.T
Wout = dot( dot(Yt,X_T), linalg.inv( dot(X,X_T) + \
    reg*eye(1+inSize+resSize) ) )

Then after the network has been trained the predictive capability is tested using the test sample of the data. The generative mode means that you start with a particular value of the time series and then you use that value to predict the next value in the time series but then you use the predicted value to predict the next value and so on. In effect you are generating the time series, hence generative mode. It allows you to predict multiple steps into the future, as opposed to predictive mode where you get one value from the time series and predict the next one.

And this is why the ESN seems to be doing a pretty good job. The target signal is pretty complex and yet in generative mode it does very well.

Finally, as far as minimal implementation goes i guess it refers to the size of the reservoir (1000), which apparently is pretty small.

Echo State Network learning Mackey-Glass function, but how?

Tags:

machine-learning

neural-network

matlab

Running the reservoir:

Training part:

Running the ESN in a generative mode:

Stefan Falk

1 Answers

kostas

Recent Activity

Donate For Us

Echo State Network learning Mackey-Glass function, but how?

Tags:

machine-learning

neural-network

matlab

Running the reservoir:

Training part:

Running the ESN in a generative mode:

Stefan Falk

1 Answers

kostas

Related questions

Recent Activity

Donate For Us