Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why tensorflow uses channel-last ordering instead of row-major?

Tags:

In most tensorflow tutorials authors use channel-last dimension ordering, e.g.

input_layer = tf.reshape(features, [-1, 28, 28, 1])

where the last digit represents the number of channels (https://www.tensorflow.org/tutorials/layers). Being used to Theano and Numpy (both use C-ordering, i.e. row-major), I find this awkward. Moreover, having read the documentation on in-memory layout schemes in tensorflow, I reckon channel-last layout will cause more cache-misses, because convolutions are carried out on individual channels, while in channel-last ordering these channels are intermixed in linear memory, effectively shrinking the cache by N (where N is the number of channels), which is especially inefficient in 3D and 4D convolutions. Am I getting something wrong?

P.S.

I've found a closely-related thread (Tensorflow 3 channel order of color inputs). The author of the accepted answer states that TF uses row-major by default, but given that all of the tutorials I've found so far show channel-last ordering I find that claim misleading.

like image 854
Eli Korvigo Avatar asked Jun 27 '17 07:06

Eli Korvigo


People also ask

Is TensorFlow channel first or last?

Channels First. We are aware of when the channel was last used and the manner in which kernels were applied, theoretically.

What are the color channels that each image is broken down into in TensorFlow?

Single-channel images are grayscale, images with 3 channels are encoded as either RGB or HSV. Images with 2 or 4 channels include an alpha channel, which has to be stripped from the image before passing the image to most image processing functions (and can be re-attached later).

What does channel first mean?

Channels first means that in a specific tensor (consider a photo), you would have (Number_Of_Channels, Height , Width) .

How do you change the image channel in Python?

Try img. transpose(2,0,1) or img. transpose(2,1,0) .


2 Answers

Here's the explanation:

https://www.tensorflow.org/performance/performance_guide#use_nchw_image_data_format

Image data format refers to the representation of batches of images. TensorFlow supports NHWC (TensorFlow default) and NCHW (cuDNN default). N refers to the number of images in a batch, H refers to the number of pixels in the vertical dimension, W refers to the number of pixels in the horizontal dimension, and C refers to the channels (e.g. 1 for black and white, 3 for RGB, etc.) Although cuDNN can operate on both formats, it is faster to operate in its default format.

The best practice is to build models that work with both NCHW and NHWC as it is common to train using NCHW on GPU, and then do inference with NHWC on CPU.

The very brief history of these two formats is that TensorFlow started by using NHWC because it was a little faster on CPUs. Then the TensorFlow team discovered that NCHW performs better when using the NVIDIA cuDNN library. The current recommendation is that users support both formats in their models. In the long term, we plan to rewrite graphs to make switching between the formats transparent.

Moreover, digging into the code we can see here that when the input is in the format NHWC, tensorflow converts it for you to NCHW.

  if (data_format == FORMAT_NHWC) {
    // Convert the input tensor from NHWC to NCHW.
    TensorShape nchw_shape =
        ShapeFromFormat(FORMAT_NCHW, in_batch, in_rows, in_cols, in_depths);
    if (in_depths > 1) {
      Tensor transformed_input;
      OP_REQUIRES_OK(ctx, ctx->allocate_temp(DataTypeToEnum<T>::value,
                                             nchw_shape, &transformed_input));
      functor::NHWCToNCHW<GPUDevice, T, 4>()(
          ctx->eigen_device<GPUDevice>(),
          const_cast<const Tensor&>(input).tensor<T, 4>(),
          transformed_input.tensor<T, 4>());
      input = transformed_input;
    } else {
      // If depth <= 1, then just reshape.
      CHECK(input.CopyFrom(input, nchw_shape));
    }
  }

You can specify the data format you want to use for every operation but tensorflow at default doesn't use NCHW but NHWC, that's why even the TF defelopers still use NHWC to avoid to specify in every operation the format

like image 166
nessuno Avatar answered Oct 09 '22 22:10

nessuno


Your question is based on a misunderstanding.

There is no contradiction between row-major and NHWC. Row-major means that the rightmost index is the one that causes the smallest jumps in memory when it changes, and changes in the leftmost index cause the biggest jumps. In row-major, the last dimension is contiguous, in column-major, the first one is. See https://en.wikipedia.org/wiki/Row-_and_column-major_order#Address_calculation_in_general for how to calculate memory offsets for arbitrary number of dimensions.

So, TF's memory IS laid out in row-major. The differences in order of the indexes are subtle (some people even prefer CHWN - see https://github.com/soumith/convnet-benchmarks/issues/66#issuecomment-155944875). NCHW is popular because it's what cudnn does best. But basically every common memory layout in DL is row-major.

like image 40
etarion Avatar answered Oct 09 '22 22:10

etarion