Loss functions in tensorflow (with an if - else)

Question

I am trying a different loss functions in tensorflow.

The loss function I want is a kind of an epsilon insensitive function (this is componentwise):

if(|yData-yModel|<epsilon):
    loss=0
else
    loss=|yData-yModel|

I tried this solution:

yData=tf.placeholder("float",[None,numberOutputs]) 

yModel=model(...

epsilon=0.2
epsilonTensor=epsilon*tf.ones_like(yData)
loss=tf.maximum(tf.abs(yData-yModel)-epsilonTensor,tf.zeros_like(yData))
optimizer = tf.train.GradientDescentOptimizer(0.25)
train = optimizer.minimize(loss)

I also used

optimizer = tf.train.MomentumOptimizer(0.001,0.9)

I do not find any error in the implementation. However, it does not converge, while the loss = tf.square(yData-yModel) converges and loss=tf.maximum(tf.square(yData-yModel)-epsilonTensor,tf.zeros_like(yData)) also converges.

So, I also tried something simpler loss=tf.abs(yData-yModel) and it also does not converge. Am I making some mistake, or having problems with the non-differentiability of the abs at zero or something else? What is happenning with the abs function?

Yaroslav Bulatov · Accepted Answer

When your loss is something like Loss(x)=abs(x-y), then solution is an unstable fixed point of SGD -- start your minimization with a point arbitrarily close to the solution, and the next step will increase the loss.

Having a stable fixed point is a requirement for convergence of an iterative procedure like SGD. In practice this means your optimization will move towards a local minimum, but after getting close enough, will jump around the solution with steps proportional to the learning rate. Here's a toy TensorFlow program that illustrates the problem

x = tf.Variable(0.)
loss_op = tf.abs(x-1.05)
opt = tf.train.GradientDescentOptimizer(0.1)
train_op = opt.minimize(loss_op)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
xvals = []
for i in range(20):
  unused, loss, xval = sess.run([train_op, loss_op, x])
  xvals.append(xval)
pyplot.plot(xvals)

Graph of x estimate

Some solutions to the problem:

Use a more robust solver such as the Proximal Gradient Method
Use more SGD friendly loss function such as Huber Loss
Use learning rate schedule to gradually decrease learning rate

Here's a way to implement (3) on the toy problem above

x = tf.Variable(0.)
loss_op = tf.abs(x-1.05)

step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(
      0.2,   # Base learning rate.
      step,  # Current index into the dataset.
      1,     # Decay step.
      0.9    # Decay rate
)

opt = tf.train.GradientDescentOptimizer(learning_rate)
train_op = opt.minimize(loss_op, global_step=step)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
xvals = []
for i in range(40):
  unused, loss, xval = sess.run([train_op, loss_op, x])
  xvals.append(xval)
pyplot.plot(xvals)

enter image description here

Loss functions in tensorflow (with an if - else)

Tags:

tensorflow

DanielTheRocketMan

1 Answers

Yaroslav Bulatov

Recent Activity

Donate For Us

Loss functions in tensorflow (with an if - else)

Tags:

tensorflow

DanielTheRocketMan

1 Answers

Yaroslav Bulatov

Related questions

Recent Activity

Donate For Us