Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Keras model architecture (node index of nested model)

This script defining a dummy model using a small nested model

from keras.layers import Input, Dense
from keras.models import Model
import keras

input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)

input = Input(shape=(5,), name='input')
x = Dense(4, name='dense_1')(input)
x = inner_model(x)
x = Dense(2, name='dense_2')(x)

output = keras.layers.concatenate([x, x], name='concat_1')
model = Model(inputs=input, outputs=output)

print(model.summary())

yields the following output

Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input (InputLayer)               (None, 5)             0                                            
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 4)             24          input[0][0]                      
____________________________________________________________________________________________________
model_1 (Model)                  (None, 3)             15          dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 2)             8           model_1[1][0]                    
____________________________________________________________________________________________________
concat_1 (Concatenate)           (None, 4)             0           dense_2[0][0]                    
                                                                   dense_2[0][0]                    

My question concerns the content of the Connected to column. I understand that a layer can have multiple nodes.

The notation of this column is layer_name[node_index][tensor_index].

If we regard inner_model as a layer I would expect it to have only one node, so I would expect dense_2 to be connected to model_1[0][0]. But in reality it is connected to model_1[1][0]. Why is this the case?

like image 419
Tobias Hermann Avatar asked Sep 02 '17 08:09

Tobias Hermann


2 Answers

1.Background

When you say:

If we regard inner_model as a layer I would expect it to have only one node

This is true in the sense that it has only one node which is part of the network.

Consider the github repository of the model.summary function. The function that prints the connections is print_layer_summary_with_connections (line 76), and it considers only the nodes from relevant_nodes array. All the nodes that are not in this array are considered not part of the network, and so the function skips them. The relevant lines are lines 88-90:

if relevant_nodes and node not in relevant_nodes:
    # node is not part of the current network
    continue

2.Your model

Now let's see what happens with your particular model. First let us define relevant_nodes:

relevant_nodes = []
for v in model.nodes_by_depth.values():
    relevant_nodes += v

The array relevant_nodes looks like:

[<keras.engine.topology.Node at 0x9dfa518>,
 <keras.engine.topology.Node at 0x9dfa278>,
 <keras.engine.topology.Node at 0x9d8bac8>,
 <keras.engine.topology.Node at 0x9d8ba58>,
 <keras.engine.topology.Node at 0x9d74518>]

However, when we print the inbound nodes at every layer, we will get:

for i in model.layers:
    print(i.inbound_nodes)

[<keras.engine.topology.Node object at 0x0000000009D74518>]
[<keras.engine.topology.Node object at 0x0000000009D8BA58>]
[<keras.engine.topology.Node object at 0x0000000009D743C8>, <keras.engine.topology.Node object at 0x0000000009D8BAC8>]
[<keras.engine.topology.Node object at 0x0000000009DFA278>]
[<keras.engine.topology.Node object at 0x0000000009DFA518>]

You can see that there is exactly one node in the list above that does not appear in relevant_nodes. This is the node in position 0 in the third array:

<keras.engine.topology.Node object at 0x0000000009D743C8>

It was not considered a part of the model, and hence did not appear in relevant_nodes. The node in position 1 in this array does appear in relevant_nodes, and this is why you see it as model_1[1][0].

3.The reason

The reason for that is basically the line x=inner_model(input). Even If you run much smaller model, as the one below:

input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)

input = Input(shape=(5,), name='input')
output = inner_model(input)

model = Model(inputs=input, outputs=output)

You will see that relevant_nodes contains two elements, while via

for i in model.layers:
        print(i.inbound_nodes)

you'll get three nodes.

This is because layer 1 (of the smaller model above) has two nodes, but only the second one is considered part of the model. In particular, if you print the input at each one of the nodes at layer 1 with layer.get_input_at(node_index), you'll get:

print(model.layers[1].get_input_at(0))
print(model.layers[1].get_input_at(1))

#prints
/input_inner
/input

4.Answers to the questions in the comment

1) Do you also know what this non-relevant node is good for / where it comes from?

This node seems to be an "internal node" created during the application of inner_model. In particular, if you print the input and output shape at each one of the three nodes (in the small model above), you get:

nodes=[model.layers[0].inbound_nodes[0],model.layers[1].inbound_nodes[0],model.layers[1].inbound_nodes[1]]
for i in nodes:
    print(i.input_shapes)
    print(i.output_shapes)
    print(" ")

#prints
[(None, 5)]
[(None, 5)]

[(None, 4)]
[(None, 3)]

[(None, 5)]
[(None, 3)]

so you could see that the shapes of the middle node (the one that does not appear in the list of relevant nodes) correspond to the shapes in inner_model.

2) Will an inner model with n output nodes always present them with node indices 1 to n instead of 0 to n-1?

I am not sure if always, as I guess there are various possibilities to have several output nodes nodes, but if I consider the following quite natural generalization of the small model above, this is indeed the case:

input_inner = Input(shape=(4,), name='input_inner')
output_inner = Dense(3, name='inner_dense')(input_inner)
inner_model = Model(inputs=input_inner, outputs=output_inner)

input = Input(shape=(5,), name='input')
output = inner_model(input)
output = inner_model(output)

model = Model(inputs=input, outputs=output)

print(model.summary())

Here I just added output = inner_model(output) to the small model. The list of relevant nodes is

[<keras.engine.topology.Node at 0xd10c390>,
 <keras.engine.topology.Node at 0xd10c9b0>,
 <keras.engine.topology.Node at 0xd10ca20>]

and the list of all inbound nodes is

[<keras.engine.topology.Node object at 0x000000000D10CA20>]
[<keras.engine.topology.Node object at 0x000000000D10C588>, <keras.engine.topology.Node object at 0x000000000D10C9B0>, <keras.engine.topology.Node object at 0x000000000D10C390>]

Indeed the node indices are 1 and 2, as you mentioned in the comment. It will continue similarly if I add another output = inner_model(output), with node indices being 1,2,3 and so on.

like image 107
Miriam Farber Avatar answered Nov 04 '22 01:11

Miriam Farber


Updated on Sep, 2020. The selected answer was a bit outdated (the link does not point to the right place), and not exactly answered the question: model_1[1][0]. Why is 1 in in [1][0] in the this the case? Here's what I found.

The code I played with is as below (I added some names for layers for better reading). You can copy and run to see the output info.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

input_inner = layers.Input(shape=(4,), name='inn_input')
output_inner = layers.Dense(3, name='inn_dense')(input_inner)
inner_model = keras.Model(inputs=input_inner, outputs=output_inner,name='inn_model')

inn_allLayers = inner_model.layers
# print(type(inn_allLayers))
print(inner_model.name,': total layer number:',len(inn_allLayers))
for i in inn_allLayers:
    print(i.name, i)
    print(len(i._inbound_nodes))
    for n in i._inbound_nodes:
        print(n.get_config())
        print(n)
    print('===================')
print('************************************************')

nest_input = layers.Input(shape=(5,), name='nest_input')
nest_d1_out = layers.Dense(4, name='nest_dense_1')(nest_input)
nest_m_out = inner_model(nest_d1_out)
nest_d2_out = layers.Dense(2, name='nest_dense_2')(nest_m_out)

nest_add_out = layers.concatenate([nest_d2_out, nest_d2_out], name='nest_concat')
model = keras.Model(inputs=nest_input, outputs=nest_add_out,name='nest_model')

inn_allLayers = inner_model.layers
# print(type(inn_allLayers))
print(inner_model.name,': total layer number:',len(inn_allLayers))
for i in inn_allLayers:
    print(i.name, i)
    print(len(i._inbound_nodes))
    for n in i._inbound_nodes:
        print(n.get_config())
        print(n)
    print('===================')
print('************************************************')

allLayers = model.layers
# print(type(allLayers))
print(model.name,': total layer number:',len(allLayers))
for i in allLayers:
    print(i.name, i)
    print(len(i._inbound_nodes))
    for n in i._inbound_nodes:
        print(n.get_config())
        print(n)
    print('===================')

for op in tf.get_default_graph().get_operations():
    print(str(op.name))

1. [1][0] represents [node_index][tensor_index]

2. what is node_index?

Under tensorflow/python/keras/engine/base_layer.py, it's described in this class:

class KerasHistory(
    collections.namedtuple('KerasHistory',
                           ['layer', 'node_index', 'tensor_index'])):
  """Tracks the Layer call that created a Tensor, for Keras Graph Networks.

  During construction of Keras Graph Networks, this metadata is added to
  each Tensor produced as the output of a Layer, starting with an
  `InputLayer`. This allows Keras to track how each Tensor was produced, and
  this information is later retraced by the `keras.engine.Network` class to
  reconstruct the Keras Graph Network.

  Attributes:
    layer: The Layer that produced the Tensor.
    node_index: The specific call to the Layer that produced this Tensor. Layers
      can be called multiple times in order to share weights. A new node is
      created every time a Tensor is called.
    tensor_index: The output index for this Tensor. Always zero if the Layer
      that produced this Tensor only has one output. Nested structures of
      Tensors are deterministically assigned an index via `nest.flatten`.
  """
  # Added to maintain memory and performance characteristics of `namedtuple`
  # while subclassing.

It says a Node is created each time a Tensor is called. To me, it is a bit vague. My understanding is that when a layer is called, it produces a Tensor, and different ways involve calling this layer will create multiple nodes (will show some print results later.)

3. How to print each node?

Under the same py file, there is this snippet:

# Create node, add it to inbound nodes.
    Node(
        self,
        inbound_layers=inbound_layers,
        node_indices=node_indices,
        tensor_indices=tensor_indices,
        input_tensors=input_tensors,
        output_tensors=output_tensors,
        arguments=arguments)

    # Update tensor history metadata.
    # The metadata attribute consists of
    # 1) a layer instance
    # 2) a node index for the layer
    # 3) a tensor index for the node.
    # The allows layer reuse (multiple nodes per layer) and multi-output
    # or multi-input layers (e.g. a layer can return multiple tensors,
    # and each can be sent to a different layer).
    for i, tensor in enumerate(nest.flatten(output_tensors)):
      tensor._keras_history = KerasHistory(self,
                                           len(self._inbound_nodes) - 1, i)

The self refers Layer object. the info is recoded in each tensor's _keras_history and self._inbound_nodes attribute. Hence we can print exactly the node by print(layer._inbound_nodes[index_of_node].get_config() I already typed the runnable code in the code at the beginning.

(What is inbound and outbound nodes? It's big confusing by first look, but if you imagine each node is an arrow pointing from one layer to another layer, it might be easier. The code description is below)

class Node(object):
  """A `Node` describes the connectivity between two layers.

  Each time a layer is connected to some new input,
  a node is added to `layer._inbound_nodes`.
  Each time the output of a layer is used by another layer,
  a node is added to `layer._outbound_nodes`.

  Arguments:
      outbound_layer: the layer that takes
          `input_tensors` and turns them into `output_tensors`
          (the node gets created when the `call`
          method of the layer was called).
      inbound_layers: a list of layers, the same length as `input_tensors`,
          the layers from where `input_tensors` originate.
      node_indices: a list of integers, the same length as `inbound_layers`.
          `node_indices[i]` is the origin node of `input_tensors[i]`
          (necessary since each inbound layer might have several nodes,
          e.g. if the layer is being shared with a different data stream).
      tensor_indices: a list of integers,
          the same length as `inbound_layers`.
          `tensor_indices[i]` is the index of `input_tensors[i]` within the
          output of the inbound layer
          (necessary since each inbound layer might
          have multiple tensor outputs, with each one being
          independently manipulable).
      input_tensors: list of input tensors.
      output_tensors: list of output tensors.
      arguments: dictionary of keyword arguments that were passed to the
          `call` method of the layer at the call that created the node.

  `node_indices` and `tensor_indices` are basically fine-grained coordinates
  describing the origin of the `input_tensors`.

  A node from layer A to layer B is added to:
    - A._outbound_nodes
    - B._inbound_nodes
  """

4. Observe node creation.

You might notice there are two exactly same print blocks for inner_model in the code: one is before nested model is built, one is after.

The output is as below:

inn_model : total layer number: 2
inn_input <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fd1c6755780>
1
{'outbound_layer': 'inn_input', 'inbound_layers': [], 'node_indices': [], 'tensor_indices': []}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e75e10>
===================
inn_dense <tensorflow.python.keras.layers.core.Dense object at 0x7fd1d2e75e80>
1
{'outbound_layer': 'inn_dense', 'inbound_layers': 'inn_input', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e92550>
===================
************************************************
inn_model : total layer number: 2
inn_input <tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7fd1c6755780>
1
{'outbound_layer': 'inn_input', 'inbound_layers': [], 'node_indices': [], 'tensor_indices': []}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e75e10>
===================
inn_dense <tensorflow.python.keras.layers.core.Dense object at 0x7fd1d2e75e80>
2
{'outbound_layer': 'inn_dense', 'inbound_layers': 'inn_input', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2e92550>
{'outbound_layer': 'inn_dense', 'inbound_layers': 'nest_dense_1', 'node_indices': 0, 'tensor_indices': 0}
<tensorflow.python.keras.engine.base_layer.Node object at 0x7fd1d2ac4358>
===================
************************************************

You will notice immediately that after the nested model is built, one extra (inbound)node (or an arrow) is created, pointing to inn_dense. One was created, pointing from inn_input to inn_dense, another was created, pointing from nest_dense_1 to inn_dense. This is what it was said earlier, each time a layer is called, a new node (an arrow) is created.

5. Question answered

So far, I think it already explains the original question: why is 1 in [1][0]. It is because reusing the the inner_model causes the inner_dense layer to be used to create a Tensor for a second time.

The rest of the code snippet has bit extra information, you can check it out and get a better idea under the hood.

like image 1
Jason Avatar answered Nov 04 '22 02:11

Jason