Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are the docker image layer ids derived

Tags:

docker

Trying to understand how a docker image layer ids are arrived at.

On a linux based VM, I pull a ubuntu 20.04 image as follows.

docker pull ubuntu:20.04

I then save it as a tar file and then extract it.

docker save ubuntu:20.04 > ubuntu2004.tar
tar -xvf ubuntu2004.tar

I have mounted a folder to my VM so now I see the extracted tar on my windows machine as follows.

Docker layer structure

You may be aware the 4 folders contain the 4 layers of the image. And the guid looking long names of the folders are the ids of the layers. Inside of those folders, we can see a json text file, and this has a json object. This Json object has the same id of the layer as well. So the id is 1c87ad44cc6b364480a5340ab1050b8dfb1691ed2abc85a1dbc3ee2fb5f2cf06

Question: How are these ids arrived at?

The following summarizes the research I am doing in this regard.

  1. One article I read says that they are randomly generated.

The diff directory for storing the layer content, is now named after a randomly generated 'cache ID', and the Docker Engine maintains the link between the layer and its cache ID, so that it knows where to locate the layer's content on disk.

I have spun up multiple VMs, pulled the same ubuntu:20.04 image, and then extracted to finally find that the layers ids are exactly the same. So I concluded that the docker engine on my host VMs must not be randomly generating those ids. It must either be using some logic to generate those Ids. Or the repository from which its pulling must be having those ids already.

Jessica G here digs into the docker layer and says the same thing, that layer ids are randomly generated.

Along with each step, the layer created is listed represented by its random generated ID.

  1. In this article that I came across, describes the chainid. First I was able to correctly evaluate the imageId and diffids as described there. Now for the chainIds. For the bottom layer, it says the chain id is same as the diff id.

For bottom layer: ChainID(layer0) = DiffID(layer0)

For other layers: ChainID(layerN) = SHA256hex(ChainID(layerN-1) + " " + DiffID(layerN))

I observed that for any layer, the id is different from diff id. My be I am missing something here. Or it is possible that this post could be outdated.

  1. Here in this post by Graham Jenson concludes that “The file names and folder structure don’t matter”. Scroll till the end and you will see that.

So till now I could not get how the ids are generated by the docker engine. Or is it that they are generated at the repository when they are pushed, and the docker engine pulls them as they are? I looked at the shell script from the Moby Project described here. It generates the ids of the layers as a sha256 of the sha of the image layer. The sha256 of the layer.tar file is first obtained. And I guess again sha is obtained from that sha and used as id for the layer. But the problem here is this layer id does not match the one that I found after extraction.

Any pointers in the right direction would be deeply appreciated.

like image 621
VivekDev Avatar asked Apr 22 '20 13:04

VivekDev


People also ask

What is a layer in Docker image?

A Docker image consists of several layers. Each layer corresponds to certain instructions in your Dockerfile. The following instructions create a layer: RUN, COPY, ADD.

How do I create a docker image?

A Docker image consists of several layers. Each layer corresponds to certain instructions in your Dockerfile. The following instructions create a layer: RUN, COPY, ADD. The other instructions will create intermediate layers and do not influence the size of your image. Let’s take a look at an example.

What are the dockerfile instructions that modify the image filesystem?

The Dockerfile instructions that modiy the image filesystem namely are ADD, COPY and RUN. Furthermore, a Docker image merely is a configuration object stored in JSON format. Such a JSON object contains, besides to image metadata like the CMD instruction, an ordered list of layers. Running docker image inspect will print those layers.

What is a content addressable ID in Docker?

These digests were introduced in Docker 1.10 and are referred to as Content Addressable IDs, because the hash value corresponds to the layer's content. Separating image objects from their layers was an important and deliberate decision, because it allows multiple images to reference one and the same layer.


1 Answers

This is a bit to unwrap, because there's not one id. First, I'd recommend looking at the OCI spec because it walks through what an image manifest is, the config, and individual layers. That's what you see on a registry. Taking the example of an nginx image I've pulled down, lets go through these.

First is the docker manifest list, or OCI index:

$ regctl manifest get --list localhost:5000/library/nginx --format '{{jsonPretty .}}'
{
  "manifests": [
    {
      "digest": "sha256:7250923ba3543110040462388756ef099331822c6172a050b12c7a38361ea46f",
      "mediaType": "application\/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      },
      "size": 1570
    },
    {
      "digest": "sha256:bb1416167bc0274d8ad2eadaef292880f59a9fa67dd3dd2149a48f9ab6f3bb79",
      "mediaType": "application\/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "arm",
        "os": "linux",
        "variant": "v5"
      },
      "size": 1570
    },
    {
      "digest": "sha256:28511266c47574675169b06eddb33c759dd6b7964b87fc4b66460e24e20fdb92",
      "mediaType": "application\/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "arm",
        "os": "linux",
        "variant": "v7"
      },
      "size": 1570
    },
    {
      "digest": "sha256:bf6ac873f0bc7e0a3454ffea4ecf93145c712508a1ca4f125c82a004f5d798a5",
      "mediaType": "application\/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "arm64",
        "os": "linux",
        "variant": "v8"
      },
      "size": 1570
    },
    {
      "digest": "sha256:387da9849102b73c06e645a8f0f40c72fae755f20d33391596041a4dcf8284a9",
      "mediaType": "application\/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "386",
        "os": "linux"
      },
      "size": 1570
    },
    {
      "digest": "sha256:44b406885395b1ded51fd3e2715de09f581fdaa3b8c43837bdc4f179dd7e91e8",
      "mediaType": "application\/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "mips64le",
        "os": "linux"
      },
      "size": 1570
    },
    {
      "digest": "sha256:1312ed2db68926470e224d61d7a9a6b77b9dc8061c6b71f0a7f634e184d7aa2a",
      "mediaType": "application\/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "ppc64le",
        "os": "linux"
      },
      "size": 1570
    },
    {
      "digest": "sha256:8ce8b5ec5634ff03911fb9d2548e9d112c79274de1a6d7192f2270b0d80e4d25",
      "mediaType": "application\/vnd.docker.distribution.manifest.v2+json",
      "platform": {
        "architecture": "s390x",
        "os": "linux"
      },
      "size": 1570
    }
  ],
  "mediaType": "application\/vnd.docker.distribution.manifest.list.v2+json",
  "schemaVersion": 2
}

In there you see pointers to each of the manifests for each platform included in this image. This manifest list itself has a digest, so if you're looking for image digests, realize there may be two, one for the index, and a different one for your platform specific image:

$ docker image inspect localhost:5000/library/nginx
[                                                  
    {                                              
        "Id": "sha256:87a94228f133e2da99cb16d653cd1373c5b4e8689956386c1c12b60a20421a02",
        "RepoTags": [
            "nginx:latest",
            "localhost:5000/library/nginx:latest"
        ],                                      
        "RepoDigests": [         
            "nginx@sha256:644a70516a26004c97d0d85c7fe1d0c3a67ea8ab7ddf4aff193d9f301670cf36",
            "localhost:5000/library/nginx@sha256:644a70516a26004c97d0d85c7fe1d0c3a67ea8ab7ddf4aff193d9f301670cf36" 
        ],
...
$ regctl manifest get --list localhost:5000/library/nginx
Name:        localhost:5000/library/nginx
MediaType:   application/vnd.docker.distribution.manifest.list.v2+json
Digest:      sha256:644a70516a26004c97d0d85c7fe1d0c3a67ea8ab7ddf4aff193d9f301670cf36

Manifests:

  Name:      localhost:5000/library/nginx@sha256:7250923ba3543110040462388756ef099331822c6172a050b12c7a38361ea46f
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64
...

$ regctl manifest get --list localhost:5000/library/nginx \
  --format raw-body | sha256sum
644a70516a26004c97d0d85c7fe1d0c3a67ea8ab7ddf4aff193d9f301670cf36  -

The image itself is made up of a config and an array of layers, these are stored as blobs in the registry, and everything here is content addressable storage, the sha256sum of content you pull is the same as the name of what you're pulling.

$ regctl manifest get localhost:5000/library/nginx@sha256:7250923ba3543110040462388756ef099331822c6172a050b12c7a38361ea46f
{        
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 
  "config": {        
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 7731,                                       
    "digest": "sha256:87a94228f133e2da99cb16d653cd1373c5b4e8689956386c1c12b60a20421a02" 
  }, 
  "layers": [ 
    {         
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 27139510,                            
      "digest": "sha256:b380bbd43752f83945df8b5d1074fef8dd044820e7d3aef33b655a2483e030c7"
    },
    { 
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 
      "size": 26638539,                              
      "digest": "sha256:fca7e12d1754baddbd07178dd1693c726e9d792c0c9659208e3f4b474dc41a7c"
    },
    { 
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 602,                                  
      "digest": "sha256:745ab57616cb3c803b3a00c3bd46fd0d94762bd5b9446eadc877cb7400fb6c11"
    },
    { 
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 894,
      "digest": "sha256:a4723e260b6fedec963910b3ae53e939fd58cbad8232258576ba26765cdcf522"
    }, 
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 666,                                  
      "digest": "sha256:1c84ebdff6819c01d0dc43a59ee391cc2cb14f7aba20cac0af1b04fb78652fc9"
    },
    { 
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 
      "size": 1394,                                  
      "digest": "sha256:858292fd2e56e240ab472db6e9fabd5fd390486660978fcf8d65a06a04c00971"
    }
  ]
}

$ regctl manifest get \
  localhost:5000/library/nginx@sha256:7250923ba3543110040462388756ef099331822c6172a050b12c7a38361ea46f \
  --format raw-body | sha256sum
7250923ba3543110040462388756ef099331822c6172a050b12c7a38361ea46f  -

The layers themselves, are tar+gz blobs that you can inspect, as you'd expect, are the same content addressable storage:

$ regctl blob get \
  localhost:5000/library/nginx sha256:b380bbd43752f83945df8b5d1074fef8dd044820e7d3aef33b655a2483e030c7 \
  | tar -tvzf - | head
drwxr-xr-x 0/0               0 2021-10-10 20:00 bin/
-rwxr-xr-x 0/0         1168776 2019-04-18 00:12 bin/bash
-rwxr-xr-x 0/0           43744 2019-02-28 10:30 bin/cat
-rwxr-xr-x 0/0           64320 2019-02-28 10:30 bin/chgrp
-rwxr-xr-x 0/0           64288 2019-02-28 10:30 bin/chmod
-rwxr-xr-x 0/0           72512 2019-02-28 10:30 bin/chown
-rwxr-xr-x 0/0          146880 2019-02-28 10:30 bin/cp
-rwxr-xr-x 0/0          121464 2019-01-17 14:08 bin/dash
-rwxr-xr-x 0/0          109408 2019-02-28 10:30 bin/date
-rwxr-xr-x 0/0           76712 2019-02-28 10:30 bin/dd

$ regctl blob get \
  localhost:5000/library/nginx sha256:b380bbd43752f83945df8b5d1074fef8dd044820e7d3aef33b655a2483e030c7 \
  | sha256sum
b380bbd43752f83945df8b5d1074fef8dd044820e7d3aef33b655a2483e030c7  -

Now when looking at the image inspect, these id's won't match:

        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:e81bff2725dbc0bf2003db10272fef362e882eb96353055778a66cda430cf81b",
                "sha256:43f4e41372e42dd32309f6a7bdce03cf2d65b3ca34b1036be946d53c35b503ab",
                "sha256:788e89a4d186f3614bfa74254524bc2e2c6de103698aeb1cb044f8e8339a90bd",
                "sha256:f8e880dfc4ef19e78853c3f132166a4760a220c5ad15b9ee03b22da9c490ae3b",
                "sha256:f7e00b807643e512b85ef8c9f5244667c337c314fa29572206c1b0f3ae7bf122",
                "sha256:9959a332cf6e41253a9cd0c715fa74b01db1621b4d16f98f4155a2ed5365da4a"
            ]
        },

And that's because the content addressable storage on the host after the image is pulled is based on the decompressed layer:

$ regctl blob get \
  localhost:5000/library/nginx sha256:b380bbd43752f83945df8b5d1074fef8dd044820e7d3aef33b655a2483e030c7 \
  | gunzip - | sha256sum
e81bff2725dbc0bf2003db10272fef362e882eb96353055778a66cda430cf81b  -

Lastly, there are two more complicated parts, the layer directory names in docker and the filenames in the save. Looking at the code, the layer directory names are unique ID's that are randomly generated. And the filenames in the save file are based on some v1 diff id's for images configs created with the layers up to that point and a digest computed on that config. That's only done to support really old versions of docker, and you can easily create a save file with different filenames, docker only cares about what's in the manifest.json. It's even possible to mix the OCI layout definition with the docker save format and have docker load the image:

$ regctl image export \
  localhost:5000/regclient/regctl:edge@sha256:76c041ed3bf9e4186327d2d63c9c597c648a7fbc07642d63f223c899e29f8d89 \
  >export.tar

$ tar -tvf export.tar
-rw-r--r-- 0/0              30 1969-12-31 19:00 oci-layout
-rw-r--r-- 0/0             389 1969-12-31 19:00 index.json
-rw-r--r-- 0/0            1435 1969-12-31 19:00 manifest.json
drwxr-xr-x 0/0               0 1969-12-31 19:00 blobs/sha256
-rw-r--r-- 0/0            1152 1969-12-31 19:00 blobs/sha256/76c041ed3bf9e4186327d2d63c9c597c648a7fbc07642d63f223c899e29f8d89
-rw-r--r-- 0/0            3021 1969-12-31 19:00 blobs/sha256/bd45abf90c52fb2c13499bbd7bb845c106e0d5b924e65dd26c8e8e2de25e54f6
-rw-r--r-- 0/0             941 1969-12-31 19:00 blobs/sha256/f6e2d7fa40092cf3d9817bf6ff54183d68d108a47fdf5a5e476c612626c80e14
-rw-r--r-- 0/0          122412 1969-12-31 19:00 blobs/sha256/92365f35877078c3e558e9a66ac083fe9a8d44bdb3150bdac058380054b05972
-rw-r--r-- 0/0             146 1969-12-31 19:00 blobs/sha256/fa98de7a23a1c3debba4398c982decfd8b31bcfad1ac6e5e7d800375cefbd42f
-rw-r--r-- 0/0         3536065 1969-12-31 19:00 blobs/sha256/24bfb25bb9426e2205338ab1480992e9a09bd6d2d9f248d3768f4feb12ad7d9e

$ tar -xvf export.tar manifest.json 
manifest.json

$ cat manifest.json | jq .
[
  {
    "Config": "blobs/sha256/bd45abf90c52fb2c13499bbd7bb845c106e0d5b924e65dd26c8e8e2de25e54f6",
    "RepoTags": [
      "localhost:5000/regclient/regctl:edge"
    ],
    "Layers": [
      "blobs/sha256/f6e2d7fa40092cf3d9817bf6ff54183d68d108a47fdf5a5e476c612626c80e14",
      "blobs/sha256/92365f35877078c3e558e9a66ac083fe9a8d44bdb3150bdac058380054b05972",
      "blobs/sha256/fa98de7a23a1c3debba4398c982decfd8b31bcfad1ac6e5e7d800375cefbd42f",
      "blobs/sha256/24bfb25bb9426e2205338ab1480992e9a09bd6d2d9f248d3768f4feb12ad7d9e"
    ],
    "LayerSources": {
      "sha256:24bfb25bb9426e2205338ab1480992e9a09bd6d2d9f248d3768f4feb12ad7d9e": {
        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
        "digest": "sha256:24bfb25bb9426e2205338ab1480992e9a09bd6d2d9f248d3768f4feb12ad7d9e",
        "size": 3536065
      },
      "sha256:92365f35877078c3e558e9a66ac083fe9a8d44bdb3150bdac058380054b05972": {
        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
        "digest": "sha256:92365f35877078c3e558e9a66ac083fe9a8d44bdb3150bdac058380054b05972",
        "size": 122412
      },
      "sha256:f6e2d7fa40092cf3d9817bf6ff54183d68d108a47fdf5a5e476c612626c80e14": {
        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
        "digest": "sha256:f6e2d7fa40092cf3d9817bf6ff54183d68d108a47fdf5a5e476c612626c80e14",
        "size": 941
      },
      "sha256:fa98de7a23a1c3debba4398c982decfd8b31bcfad1ac6e5e7d800375cefbd42f": {
        "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
        "digest": "sha256:fa98de7a23a1c3debba4398c982decfd8b31bcfad1ac6e5e7d800375cefbd42f",
        "size": 146
      }
    }
  }
]

$ docker load <export.tar
132414a5f587: Loading layer [==================================================>]     941B/941B
482fa2862396: Loading layer [==================================================>]  122.4kB/122.4kB
8e47dcad786a: Loading layer [==================================================>]     146B/146B
65131f050950: Loading layer [==================================================>]  3.536MB/3.536MB
The image localhost:5000/regclient/regctl:edge already exists, renaming the old one with ID sha256:5ec718d68e782ea7df08e19af7b84de3c1d34b81fabe48c89a43c3439c9063dd to empty string 
Loaded image: localhost:5000/regclient/regctl:edge

(Note I switched to a different image in this last example because the docker engine already had the layers for the nginx example.)

like image 174
BMitch Avatar answered Oct 14 '22 17:10

BMitch