Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loss functions in tensorflow (with an if - else)

Tags:

tensorflow

I am trying a different loss functions in tensorflow.

The loss function I want is a kind of an epsilon insensitive function (this is componentwise):

if(|yData-yModel|<epsilon):
    loss=0
else
    loss=|yData-yModel|    

I tried this solution:

yData=tf.placeholder("float",[None,numberOutputs]) 

yModel=model(...

epsilon=0.2
epsilonTensor=epsilon*tf.ones_like(yData)
loss=tf.maximum(tf.abs(yData-yModel)-epsilonTensor,tf.zeros_like(yData))
optimizer = tf.train.GradientDescentOptimizer(0.25)
train = optimizer.minimize(loss)

I also used

optimizer = tf.train.MomentumOptimizer(0.001,0.9)

I do not find any error in the implementation. However, it does not converge, while the loss = tf.square(yData-yModel) converges and loss=tf.maximum(tf.square(yData-yModel)-epsilonTensor,tf.zeros_like(yData)) also converges.

So, I also tried something simpler loss=tf.abs(yData-yModel) and it also does not converge. Am I making some mistake, or having problems with the non-differentiability of the abs at zero or something else? What is happenning with the abs function?

like image 543
DanielTheRocketMan Avatar asked Jan 31 '16 12:01

DanielTheRocketMan


1 Answers

When your loss is something like Loss(x)=abs(x-y), then solution is an unstable fixed point of SGD -- start your minimization with a point arbitrarily close to the solution, and the next step will increase the loss.

Having a stable fixed point is a requirement for convergence of an iterative procedure like SGD. In practice this means your optimization will move towards a local minimum, but after getting close enough, will jump around the solution with steps proportional to the learning rate. Here's a toy TensorFlow program that illustrates the problem

x = tf.Variable(0.)
loss_op = tf.abs(x-1.05)
opt = tf.train.GradientDescentOptimizer(0.1)
train_op = opt.minimize(loss_op)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
xvals = []
for i in range(20):
  unused, loss, xval = sess.run([train_op, loss_op, x])
  xvals.append(xval)
pyplot.plot(xvals)

Graph of x estimate

Some solutions to the problem:

  1. Use a more robust solver such as the Proximal Gradient Method
  2. Use more SGD friendly loss function such as Huber Loss
  3. Use learning rate schedule to gradually decrease learning rate

Here's a way to implement (3) on the toy problem above

x = tf.Variable(0.)
loss_op = tf.abs(x-1.05)

step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(
      0.2,   # Base learning rate.
      step,  # Current index into the dataset.
      1,     # Decay step.
      0.9    # Decay rate
)

opt = tf.train.GradientDescentOptimizer(learning_rate)
train_op = opt.minimize(loss_op, global_step=step)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
xvals = []
for i in range(40):
  unused, loss, xval = sess.run([train_op, loss_op, x])
  xvals.append(xval)
pyplot.plot(xvals)

enter image description here

like image 116
Yaroslav Bulatov Avatar answered Sep 24 '22 03:09

Yaroslav Bulatov