Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Docker calculates the hash of each layer? Is it deterministic?

Tags:

I tried to find this information around the Docker official docs, but had no success.

Which pieces of information does Docker take into account when calculating the hash of each commit/layer?

It's pretty obvious that the line in the Dockerfile is part of the hash and, of course, the parent commit hash. But is something else take into account when calculating this hash?

Concrete use case: Let's suppose I have two devs in different machines, at different points in time (and because of that, different docker daemons and different caches) running $ docker build ... against the same Dockerfile. The FROM ... directive will give them the same starting point, but will the resulting hash of each operation result on the same hash? Is it deterministic?

like image 287
Victor Schröder Avatar asked Mar 31 '16 17:03

Victor Schröder


People also ask

What is docker SHA256?

A Docker image's ID is a digest, which contains an SHA256 hash of the image's JSON configuration object. Docker creates intermediate images during a local image build, for the purposes of maintaining a build cache. An image manifest is created and pushed to a Docker registry when an image is pushed.

How do I determine the size of a docker layer?

Running docker image ls shows the sizes of your images. To see the size of the intermediate images that make up your image use docker image history my_image:my_tag . Running docker image inspect my_image:tag will show you many things about your image, including the sizes of each layer.

Are docker images reproducible?

To summarize: docker can make it easier to build your stuff reproducibly, but docker build itself is not reproducible. Other tools like jib, bazel, kaniko and ko can do reproducible container builds.


1 Answers

Thanks @thaJeztah. Answer is in https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b#id-definitions-and-calculations

  1. layer.DiffID: ID for an individual layer

    Calculation: DiffID = SHA256hex(uncompressed layer tar data)

  2. layer.ChainID: ID for a layer and its parents. This ID uniquely identifies a filesystem composed of a set of layers.

    Calculation:

    • For bottom layer: ChainID(layer0) = DiffID(layer0)
    • For other layers: ChainID(layerN) = SHA256hex(ChainID(layerN-1) + " " + DiffID(layerN))
  3. image.ID: ID for an image. Since the image configuration references the layers the image uses, this ID incorporates the filesystem data and the rest of the image configuration.

    Calculation: SHA256hex(imageConfigJSON)

like image 114
robrich Avatar answered Oct 31 '22 20:10

robrich