Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a preprocessing layer to keras model and setting tensor values

How would one best add a preprocessing layer (e.g., subtract mean and divide by std) to a keras (v2.0.5) model such that the model becomes fully self contained for deployment (possibly in a C++ environment). I tried:

    def getmodel():
       model = Sequential()
       mean_tensor = K.placeholder(shape=(1,1,3), name="mean_tensor")
       std_tensor = K.placeholder(shape=(1,1,3), name="std_tensor")

       preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
                              input_shape=im_shape)

       model.add(preproc_layer)

       # Build the remaining model, perhaps set weights,
       ...

       return model

Then, somewhere else set the mean/std on the model. I found the set_value function so tried the following:

m = getmodel()
mean, std = get_mean_std(..)

graph = K.get_session().graph
mean_tensor = graph.get_tensor_by_name("mean_tensor:0")
std_tensor = graph.get_tensor_by_name("std_tensor:0")

K.set_value(mean_tensor, mean)
K.set_value(std_tensor, std)

However the set_value fails with

AttributeError: 'Tensor' object has no attribute 'assign'

So set_value does not work as (the limited) docs would suggest. What would the proper way be to do this? Get the TF session, wrap all the training code in a with (session) and use feed_dict? I would have thought there would be a native keras way to set tensor values.

Instead of using a placeholder I tried setting the mean/std on model construction using either K.variable or K.constant:

mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")

This avoids any set_value problems. Though I notice that if I try to train that model (which I know is not particularly efficient as you are re-doing the normalisation for every image) it works but at the end of the first epoch the ModelCheckpoint handler fails with a very deep stack trace:

...
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 102, in save_model
  'config': model.get_config()
File "/Users/dgorissen/Library/Python/2.7/lib/python/site-packages/keras/models.py", line 1193, in get_config
  return copy.deepcopy(config)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 163, in deepcopy
  y = copier(x, memo)
...
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 190, in deepcopy
  y = _reconstruct(x, rv, 1, memo)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 343, in _reconstruct
  y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'

Update 1:

I also tried a different approach. Train a model as normal, then just prepend a second model that does the preprocessing:

# Regular model, trained as usual
model = ...

# Preprocessing model
preproc_model = Sequential()
mean_tensor = K.constant(mean, name="mean_tensor")
std_tensor = K.constant(std, name="std_tensor")
preproc_layer = Lambda(lambda x: (x - mean_tensor) / (std_tensor + K.epsilon()),
                       input_shape=im_shape, name="normalisation")
preproc_model.add(preproc_layer)

# Prepend the preprocessing model to the regular model    
full_model = Model(inputs=[preproc_model.input],
              outputs=[model(preproc_model.output)])

# Save the complete model to disk
full_model.save('full_model.hdf5')

This seems to work until the save() call, which fails with the same deep stack trace as above. Perhaps the Lambda layer is the problem but juding from this issue the it seems it should serialise properly though.

So overall, how to I append a normalisation layer to a keras model without compromising the ability to serialise (and export to pb)?

Im sure you can get it working by dropping down to TF directly (e.g. this thread, or using tf.Transform) but would have thought it would be possible in keras directly.

Update 2:

So I found that the deep stack trace could be avoided by doing

def foo(x):
    bar = K.variable(baz, name="baz")
    return x - bar

So defining bar inside the function instead of capturing from the outside scope.

I then found I could save to disk but could not load from disk. There are a suite of github issues around this. I used the workaround specified in #5396 to pass all variables in as arguments, this then allowed me to save and load.

Thinking I was almost there I continued with my approach from Update 1 above of stacking a pre-processing model in front of a trained model. This then led to Model is not compiled errors. Worked around those but in the end I never managed to get the following to work:

  • Build and train a model
  • Save it to disk
  • Load it, prepend a preprocessing model
  • Export the stacked model to disk as a frozen pb file
  • Load the frozen pb from disk
  • Apply it on some unseen data

I got it to the point where there were no errors, but could not get the normalisation tensors to propagate through to the frozen pb. Having spent too much time on this I then gave up and switched to the somewhat less elegant approach of:

  • Build a model with the preprocessing operations in the model from the start but set to a no-op (mean=0, std=1)
  • Train the model, build an identical model but this time with the proper values for mean/std.
  • Transfer the weights
  • Export and freeze the model to pb

All this now fully works as expected. Small overhead on training but negligible for me.

Still failed to figure out how one would set the value of a tensor variable in keras (without raising the assign exception) but can do without it for now.

Will accept @Daniel's answer as it got me going in the right direction.

Related question:

  • Add Tensorflow pre-processing to existing Keras model (for use in Tensorflow Serving)
like image 255
dgorissen Avatar asked Jun 29 '17 21:06

dgorissen


1 Answers

When creating a variable, you must give it the "value", not the shape:

mean_tensor = K.variable(mean, name="mean_tensor")
std_tensor = K.variable(std, name="std_tensor")

Now, in Keras, you don't have to deal with session, graph and things like that. You work only with layers, and inside Lambda layers (or loss functions) you may work with tensors.

For our Lambda layer, we need a more complex function, because shapes must match before you do a calculation. Since I don't know im_shape, I supposed it had 3 dimensions:

def myFunc(x):

    #reshape x in a way it's compatible with the tensors mean and std:
    x = K.reshape(x,(-1,1,1,3)) 
        #-1 is like a wildcard, it will be the value that matches the rest of the given shape.     
        #I chose (1,1,3) because it's the same shape of mean_tensor and std_tensor

    result = (x - mean_tensor) / (std_tensor + K.epsilon())

    #now shape it back to the same shape it was before (which I don't know)    
    return K.reshape(result,(-1,im_shape[0], im_shape[1], im_shape[2]))
        #-1 is still necessary, it's the batch size

Now we create the Lambda layer, considering it needs also an output shape (because of your custom operation, the system does not necessarily know the output shape)

model.add(Lambda(myFunc,input_shape=im_shape, output_shape=im_shape))

After this, just compile the model and train it. (Often with model.compile(...) and model.fit(...))


If you want to include everything, including the preprocessing inside the function, ok too:

def myFunc(x):

    mean_tensor = K.mean(x,axis=[0,1,2]) #considering shapes of (size,width, heigth,channels)    
    std_tensor = K.std(x,axis=[0,1,2])

    x = K.reshape(x, (-1,3)) #shapes of mean and std are (3,) here.    
    result = (x - mean_tensor) / (std_tensor + K.epsilon())

    return K.reshape(result,(-1,width,height,3))

Now, all this is extra calculation in your model and will consume processing. It's better to just do everything outside the model. Create the preprocessed data first and store it, then create the model without this preprocessing layer. This way you get a faster model. (It can be important if your data or your model is too big).

like image 122
Daniel Möller Avatar answered Oct 14 '22 01:10

Daniel Möller