Not fully connected layer in tensorflow

Question

I want to create a network where in the input layer nodes are just connected to some nodes in the next layer. Here is a small example:

enter image description here

My solution so far is that I set the weight of the edge between i1 and h1 to zero and after every optimization step I multiply the weights with a matrix (I call this matrix mask matrix) in which every entry is 1 except the entry of the weight of the edge between i1 and h1. (See code below)

Is this approach right? Or does this have a affect on the GradientDescent? Is there another approach to create this kind of a network in TensorFlow?

import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np

tf.enable_eager_execution()


model = tf.keras.Sequential([
  tf.keras.layers.Dense(2, activation=tf.sigmoid, input_shape=(2,)),  # input shape required
  tf.keras.layers.Dense(2, activation=tf.sigmoid)
])


#set the weights
weights=[np.array([[0, 0.25],[0.2,0.3]]),np.array([0.35,0.35]),np.array([[0.4,0.5],[0.45, 0.55]]),np.array([0.6,0.6])]

model.set_weights(weights)

model.get_weights()

features = tf.convert_to_tensor([[0.05,0.10 ]])
labels =  tf.convert_to_tensor([[0.01,0.99 ]])


mask =np.array([[0, 1],[1,1]])

#define the loss function
def loss(model, x, y):
  y_ = model(x)
  return tf.losses.mean_squared_error(labels=y, predictions=y_)

#define the gradient calculation
def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return loss_value, tape.gradient(loss_value, model.trainable_variables) 

#create optimizer an global Step
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
global_step = tf.train.get_or_create_global_step()


#optimization step
loss_value, grads = grad(model, features, labels)
optimizer.apply_gradients(zip(grads, model.variables),global_step)

#masking the optimized weights 
weights=(model.get_weights())[0]
masked_weights=tf.multiply(weights,mask)
model.set_weights([masked_weights])

today · Accepted Answer

If you are looking for a solution for the specific example you provided, you can simply use tf.keras Functional API and define two Dense layers where one is connected to both neurons in the previous layer and the other one is only connected to one of the neurons:

from tensorflow.keras.layer import Input, Lambda, Dense, concatenate
from tensorflow.keras.models import Model

inp = Input(shape=(2,))
inp2 = Lambda(lambda x: x[:,1:2])(inp)   # get the second neuron 

h1_out = Dense(1, activation='sigmoid')(inp2)  # only connected to the second neuron
h2_out = Dense(1, activation='sigmoid')(inp)  # connected to both neurons
h_out = concatenate([h1_out, h2_out])

out = Dense(2, activation='sigmoid')(h_out)

model = Model(inp, out)

# simply train it using `fit`
model.fit(...)

FinleyGibson · Answer

The problem with your solution and some others suggested by other answers in this post is that they do not prevent training of this weight. They allow the gradient descent to train the non existent weight and then overwrite it retrospectively. This will result in a network that has a zero in this location as desired, but will negatively affect your training process as the back propagation calculation will not see the masking step as it is not part of a TensorFlow graph and so the gradient descent will follow a path which includes the assumption that this weight does have an affect on the outcome (it does not).

A better solution would be to include the masking step as a part of your TensorFlow graph, so that it can be factored into the gradient descent. Since the masking step is simply a element wise multiplication by your sparse, binary martix mask, you could just include the mask matrix as an elementwise matrix multiplicaiton in the graph definition using tf.multiply.

Sadly this means sying goodbye to the user friendly keras,layers methods and embracing a more nuts & bolts approach to TensorFlow. I can't see an obvious way to do it using the layers API.

See the implementation below, I have tried to provide comments explaining what is happening at each stage.

import tensorflow as tf

## Graph definition for model

# set up tf.placeholders for inputs x, and outputs y_
# these remain fixed during training and can have values fed to them during the session
with tf.name_scope("Placeholders"):
    x = tf.placeholder(tf.float32, shape=[None, 2], name="x")   # input layer
    y_ = tf.placeholder(tf.float32, shape=[None, 2], name="y_") # output layer

# set up tf.Variables for the weights at each layer from l1 to l3, and setup feeding of initial values
# also set up mask as a variable and set it to be un-trianable
with tf.name_scope("Variables"):
    w_l1_values = [[0, 0.25],[0.2,0.3]]
    w_l1 = tf.Variable(w_l1_values, name="w_l1")
    w_l2_values = [[0.4,0.5],[0.45, 0.55]]
    w_l2 = tf.Variable(w_l2_values, name="w_l2")

    mask_values = [[0., 1.], [1., 1.]]
    mask = tf.Variable(mask_values, trainable=False, name="mask")


# link each set of weights as matrix multiplications in the graph. Inlcude an elementwise multiplication by mask.
# Sequence takes us from inputs x to output final_out, which will be compared to labels fed to placeholder y_
l1_out = tf.nn.relu(tf.matmul(x, tf.multiply(w_l1, mask)), name="l1_out")
final_out = tf.nn.relu(tf.matmul(l1_out, w_l2), name="output")


## define loss function and training operation
with tf.name_scope("Loss"):
    # some loss defined as a function of graph output: final_out and labels: y_
    loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=final_out, labels=y_, name="loss")

with tf.name_scope("Train"):
    # some optimisation strategy, arbitrary learning rate
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001, name="optimizer_adam")
    train_op = optimizer.minimize(loss, name="train_op")


# create session, initialise variables and train according to inputs and corresponding labels
# This should show that the values of the first layer weights change, but the one set to 0 remains at 0
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    initial_l1_weights = sess.graph.get_tensor_by_name("Variables/w_l1:0")
    print(initial_l1_weights.eval())

    inputs = [[0.05, 0.10]]
    labels = [[0.01, 0.99]]
    ans = sess.run(train_op, feed_dict={"Placeholders/x:0": inputs, "Placeholders/y_:0": labels})

    train_steps = 1
    for i in range(train_steps):
        initial_l1_weights = sess.graph.get_tensor_by_name("Variables/w_l1:0")
    print(initial_l1_weights.eval())

Or use the answer provided by today for a keras friendly option.

Not fully connected layer in tensorflow

Tags:

python

machine-learning

neural-network

tensorflow

keras

Ev4

2 Answers

today

FinleyGibson

Recent Activity

Donate For Us

Not fully connected layer in tensorflow

Tags:

python

machine-learning

neural-network

tensorflow

keras

Ev4

2 Answers

today

FinleyGibson

Related questions

Recent Activity

Donate For Us