Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Internals of Variable in tensorflow

Tags:

tensorflow

I explore how is variable represented in graph. I create a variable, initialize it and make graph snapshots after every action:

import tensorflow as tf

def dump_graph(g, filename):
    with open(filename, 'w') as f:
        print(g.as_graph_def(), file=f)

g = tf.get_default_graph()
var = tf.Variable(2)
dump_graph(g, 'data/after_var_creation.graph')

init = tf.global_variables_initializer()
dump_graph(g, 'data/after_initializer_creation.graph')

with tf.Session() as sess:
    sess.run(init)
    dump_graph(g, 'data/after_initializer_run.graph')

Graph after variable creation looks like

node {
  name: "Variable/initial_value"
  op: "Const"
  attr {
    key: "dtype"
    value {
      type: DT_INT32
    }
  }
  attr {
    key: "value"
    value {
      tensor {
        dtype: DT_INT32
        tensor_shape {
        }
        int_val: 2
      }
    }
  }
}
node {
  name: "Variable"
  op: "VariableV2"
  attr {
    key: "container"
    value {
      s: ""
    }
  }
  attr {
    key: "dtype"
    value {
      type: DT_INT32
    }
  }
  attr {
    key: "shape"
    value {
      shape {
      }
    }
  }
  attr {
    key: "shared_name"
    value {
      s: ""
    }
  }
}
node {
  name: "Variable/Assign"
  op: "Assign"
  input: "Variable"
  input: "Variable/initial_value"
  attr {
    key: "T"
    value {
      type: DT_INT32
    }
  }
  attr {
    key: "_class"
    value {
      list {
        s: "loc:@Variable"
      }
    }
  }
  attr {
    key: "use_locking"
    value {
      b: true
    }
  }
  attr {
    key: "validate_shape"
    value {
      b: true
    }
  }
}
node {
  name: "Variable/read"
  op: "Identity"
  input: "Variable"
  attr {
    key: "T"
    value {
      type: DT_INT32
    }
  }
  attr {
    key: "_class"
    value {
      list {
        s: "loc:@Variable"
      }
    }
  }
}
versions {
  producer: 21
}

There are several nodes: Variable/initial_value, Variable, Variable/Assign, Variable/read.

After running init operation, another node is added:

node {
  name: "init"
  op: "NoOp"
  input: "^Variable/Assign"
}

I do not have tight grasp of what happens here.

  1. Could anybody explain what is the precise meaning of these nodes?
  2. What is the purpose of implicit variables initialization in tensorflow Python API? Why can't we automatically initialize a variable after variable object creation, or initialize uninitialized variables inside Session.run()?
  3. What is the meaning of "loc:@" syntax inside Variable/read node and ^Variable/Assign inside init node?
  4. How does retrieving of a variable value work? I suppose that the value is stored inside a session, and that session.run() substitue somewhere for this value, but do not know the gory details.
like image 979
Alexander Lobov Avatar asked Dec 14 '22 00:12

Alexander Lobov


1 Answers

The implementation of TensorFlow's tf.Variable class can be found in the source repository here. The Python wrapper class is responsible for creating several nodes in the dataflow graph, and providing convenient accessors for using them. I'll use the names from your example to make things clear:

  • Node Variable of type VariableV2 is the stateful TensorFlow op that owns the memory for the variable. Every time you run that op, it emits the buffer (as a "ref tensor") so that other ops can read or write it.

  • Node Variable/initial_value (of type Const) is the tensor that you provided as the initial_value argument of the tf.Variable constructor. This can be any type of tensor, although commonly it's a tf.random_*() op used for random weight initialization. The suffix initial_value implies that it was probably created by passing a non-tensor that was implicitly converted to a tensor.

  • Node Variable/Assign of type Assign is the initializer operation that writes the initial value into the variable's memory. It is typically run once, when you do sess.run(tf.global_variables_initializer()) later in your program.

  • Node Variable/read of type Identity is an operation that "dereferences" the Variable op's "ref tensor" output. This is mostly an implementation detail, but it provides desirable behavior when a variable is read multiple times across process boundaries: in particular, the value is only copied once because the output of this op is not a "ref tensor". (If instead the "ref" edge is partitioned between processes, TensorFlow will copy the variable multiple times. This is occasionally useful (if you want to see the effect of a write on a different device in the same step), but it's quite niche.

Variable initialization in TensorFlow is explicit and this can cause headaches (e.g. if you forget to run the initializers for all of your variables). However, the reason we don't do it implicitly is that there are many ways to initialize a variable: from a tensor, from a checkpoint, or from another process (when doing between graph replication). TensorFlow can't guess which one you intend, so it makes the process explicit.

The "loc:@Variable" syntax is used to colocate nodes on the same device. In particular, any op that has this value for its _class attr will be placed on the same device as the Variable operation.

Retrieving the value of a variable is quite simple: the variable op outputs a tensorflow::Tensor value, and this value can be copied back through the tensorflow::Session::Run() API. The Python bindings then copy this value into a NumPy array.

like image 151
mrry Avatar answered Dec 29 '22 01:12

mrry