Is it possible to implement MLP mixture of expert methodology in Keras? Could you please guide me by a simple code in Keras for a binary problem with 2 experts.
It needs to define a cost function like this:
g = gate.layers[-1].output
o1 = mlp1.layers[-1].output
o2 = mlp2.layers[-1].output
def ME_objective(y_true, y_pred):
A = g[0] * T.exp(-0.5*T.sqr(y_true – o1))
B = g[1] * T.exp(-0.5*T.sqr(y_true – o2))
return -T.log((A+B).sum()) # cost
You can definitely model such a structure in Keras, with a merge layer, which enables you to combine different inputs. Here is a SSCCE that you'll hopefully be able to adapt to your structure
import numpy as np
from keras.engine import Merge
from keras.models import Sequential
from keras.layers import Dense
import keras.backend as K
xdim = 4
ydim = 1
gate = Sequential([Dense(2, input_dim=xdim)])
mlp1 = Sequential([Dense(1, input_dim=xdim)])
mlp2 = Sequential([Dense(1, input_dim=xdim)])
def merge_mode(branches):
g, o1, o2 = branches
# I'd have liked to write
# return o1 * K.transpose(g[:, 0]) + o2 * K.transpose(g[:, 1])
# but it doesn't work, and I don't know enough Keras to solve it
return K.transpose(K.transpose(o1) * g[:, 0] + K.transpose(o2) * g[:, 1])
model = Sequential()
model.add(Merge([gate, mlp1, mlp2], output_shape=(ydim,), mode=merge_mode))
model.compile(optimizer='Adam', loss='mean_squared_error')
train_size = 19
nb_inputs = 3 # one input tensor for each branch (g, o1, o2)
x_train = [np.random.random((train_size, xdim)) for _ in range(nb_inputs)]
y_train = np.random.random((train_size, ydim))
model.fit(x_train, y_train)
Here is an implementation of the objective you described. There are a few mathematical concerns to keep in mind though (see below).
def me_loss(y_true, y_pred):
g = gate.layers[-1].output
o1 = mlp1.layers[-1].output
o2 = mlp2.layers[-1].output
A = g[:, 0] * K.transpose(K.exp(-0.5 * K.square(y_true - o1)))
B = g[:, 1] * K.transpose(K.exp(-0.5 * K.square(y_true - o2)))
return -K.log(K.sum(A+B))
# [...] edit the compile line from above example
model.compile(optimizer='Adam', loss=me_loss)
Short version: somewhere in your model, I think there should be at least one constraint (maybe two):
For any
x
,sum(g(x)) = 1
For any
x
,g0(x) > 0 and g1(x) > 0
# might not be strictly necessary
Domain study
If o1(x)
and o2(x)
are infinitely far from y
:
A -> B -> +-0
depending on g0(x)
and g1(x)
signscost -> +infinite
or nan
If o1(x)
and o2(x)
are infinitely close to y
:
A -> g0(x)
and B -> g1(x)
cost -> -log(sum(g(x)))
The problem is that log
is only defined on ]0, +inf[
. Which means that for the objective to be always defined, there needs to be a constraint somewhere ensuring sum(A(x) + B(x)) > 0
for any x
. A more restrictive version of that constraint would be (g0(x) > 0
and g1(x) > 0
).
Convergence
An even more important concern here is that this objective does not seem to be designed to converge towards 0. When mlp1
and mlp2
start predicting y
correctly (case 2.), there is currently nothing preventing the optimizer to make sum(g(x))
tend towards +infinite
, to make loss
tend towards -inifinite
.
Ideally, we'd like loss -> 0
, i.e. sum(g(x)) -> 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With