Since Docker v1.10, with the introduction of the content addressable storage, Docker has completely changed the way image data are handled on the disk. I understand that now layers and images are separated. Layers merely become collections of files and directories that have no notion of images and can be freely shared across images. See the update and a blog with better explanation.
During docker push
and docker pull
, via stdout it can be seen the layers are transported, though the resulting SHA hashes are completely regenerated on the destination.
With a locally built image from ubuntu:14.04
base, when I use the docker history
command, I can see a chain of intermediary images used during the build process, and the disk space usage they contributed.
root@ruifeng-VirtualBox:/var/lib/docker/aufs/diff# docker history image_size
IMAGE CREATED CREATED BY SIZE COMMENT
9ae1f372d83c 11 weeks ago /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "/bin/ 0 B
aaf66e9fa85b 11 weeks ago /bin/sh -c chown -R martian /home/martian 6.299 MB
9568768134c1 11 weeks ago /bin/sh -c rm -rf /home/martian/potatoes 0 B
2f40f3f58306 11 weeks ago /bin/sh -c mv /home/martian/water_tanks /home 6.289 MB
062e2702ffa2 11 weeks ago /bin/sh -c mv /home/martian/potatoes /home/ma 5.394 kB
7b2d8b4c1dd0 11 weeks ago /bin/sh -c chown -R martian /home/martian 6.299 MB
8fd47fed98d6 11 weeks ago /bin/sh -c #(nop) COPY dir:421da6c71a1f252881 6.289 MB
...
And I can use the docker inspect
command to get the underlying layers.
root@ruifeng-VirtualBox:/var/lib/docker/aufs/diff# docker inspect image_size | jq -r '.[].RootFS'
{
"Layers": [
"sha256:a85f35566a268e6f4411c5157ffcffe4f5918b068b04d79fdd80003901ca39da",
"sha256:eaaf7298332642da0f8190fa4b96ad46c04b9c1d1682bc3a35d77bded2b1e0a9",
"sha256:33a212e8aa5642d3a2ddead146e85912407fc5bbb2a896dab11fcf329177a999",
"sha256:f1f25d8c6e56dc4891df147a77f57e756873b57f33ce95e6a0acbe47117c0c8a",
"sha256:67852b7d2cf5f0885293fa9df91ebfd8ef0c42ba11a5155f94806f3a96c5e916",
"sha256:480d48b7e2864a44c1b2fca0c7e32fbab505f7526ccb25bbfed191c04a9bb7b0",
"sha256:18d270fe64aa423e0ffdf24faf0103432027da3d5c12f4505e7daedad9fe2195",
"sha256:a73c3f5eb83790bc6d03381a43a20aef7d0d9d97de0cff4b040e8e4c01a3aee5",
"sha256:e8d1b67ace73cb92cc00725354e84024153bedae4280149c03fcb52f34d83757",
"sha256:19a4b80afc677825fec94adf8b6a45a866f42a38675f87f86e50171ff5e0a280",
"sha256:77d412270fbdd9baba1fe73028b786c3a1709feefa9b03be74b8e9f9ce148635",
"sha256:2ad21e37389addd577161c981d0c69ab60aa47945172f41f9ec71ada1c1dd4ee",
"sha256:771d1e47ca8d8dcf55069786e4c499894fba86f704c808413df00f4f980564e1",
"sha256:f9c02c6fa436213c0f220d49c4ee1b913372081010d4506757ec75d3e788847c"
],
"Type": "layers"
}
My question is, how do I link these layers marked with SHA hashes to the images listed in the IMAGE column of the previous command output? And is there a way to find out the actual location and size of these layers on the disk?
If I am not wrong, the layers should be kept at /var/lib/docker/aufs/diff
if the storage driver selection is aufs
. But the contents in that folder are named with randomly generated IDs that do not match any of the layer literally. It seems the match is only kept within Docker Engine for security concerns.
The second image contains all the layers from the first image, plus new layers created by the COPY and RUN instructions, and a read-write container layer. Docker already has all the layers from the first image, so it does not need to pull them again. The two images share any layers they have in common.
An image can consist of a single layer (that's often the case when the squash command was used). Each instruction in a Dockerfile results in a layer. (Except for multi-stage builds, where usually only the layers in the final image are pushed, or when an image is squashed to a single layer).
Use --squash flag on build It allows you to merge the new layers into one layer during the build time. To use it just add the flag to the build command: docker build --squash -t <image> . You can use it by activating the experimental features in the Docker settings.
It seems the match is only kept within Docker Engine for security concerns.
Well, obviously it has to be stored somewhere on disk because the information needs to be persistent. I'm using the overlay2
driver rather than aufs
, but I'm guessing the layout is relatively similar. Let's start with an image I have locally:
# docker images | grep alpine
alpine latest baa5d63471ea 5 months ago 4.8 MB
Which has the following layers:
# docker inspect alpine | jq '.[0].RootFS'
{
"Type": "layers",
"Layers": [
"sha256:011b303988d241a4ae28a6b82b0d8262751ef02910f0ae2265cb637504b72e36"
]
}
Let's look in /var/lib/docker
for something matching the image id prefix:
# cd /var/lib/docker
# find . -name 'baa5d63471ea*' -print
./image/overlay2/imagedb/content/sha256/baa5d63471ead618ff91ddfacf1e2c81bf0612bfeb1daf00eb0843a41fbfade3
That is a JSON file containing a bunch of data, including something that looks relevant:
"rootfs": {
"type": "layers",
"diff_ids": [
"sha256:011b303988d241a4ae28a6b82b0d8262751ef02910f0ae2265cb637504b72e36"
]
}
Great! Using this information, we should be able to take a layer id and find all the images that use it. For example, I have a locally built image that looks like:
# docker inspect larsks/qgroundcontrol | jq '.[0].RootFS'
{
"Type": "layers",
"Layers": [
"sha256:c854e44a1a5a22c9344c90f68ef6e07fd479e8ce1030fe141e3236887f1a4920",
"sha256:8ba4b4ea187c5ea58c11ee99bbc159b88b303c290b18c2220c9b477f4427bb9e",
"sha256:46c98490f5756634de1b1b9ed02a9fae2732984049f4f8fa182959fea924a45c",
"sha256:1633f88f8c9fa73c5c0c24f314e81b10dda6c310d41fb87eba02421e1652f6dc",
"sha256:0e20f4f8a593705219d1b3c5b1d2f7b8664eb04d706e99add87adbdcceea4a9f",
"sha256:cb16829cadf4f4320799bdf23f7400816f1552a011f3e30c2c929382896c3f6f",
"sha256:5e6951567308b8aacd8f6bded126ab33a72e7aa584d012a8d0d6283c29d32995",
"sha256:66a1378b08992e4043cf4e391d5b7f52f0d8c4b825dc62a2d87c23ba6ea1dd35",
"sha256:d397a7c12cc95021d41059a44c137000dfcbf12e6ba295ccb647c075e368e39c",
"sha256:8a2c46060eadf56c93467f9445cc49a715a935b0e3b4b439ae8c00fcf3a2157b",
"sha256:70a195ccb5fc7423cc15dd55fb446a19bfd2e1d1a4e5132b79f9433b7d7df750",
"sha256:349fbf13a3797683fe9a2c8355df2a272da391efab8e11c9e083e3c95c094859"
]
}
Let's find a list of other images that include the same base layer (sha256:c854e44a1a5a22c9344c90f68ef6e07fd479e8ce1030fe141e3236887f1a4920
):
# cd /var/lib/docker
# grep -rl sha256:c854e44a1a5a22c9344c90f68ef6e07fd479e8ce1030fe141e3236887f1a4920 image/overlay2/imagedb/
Which will return something like:
image/overlay2/imagedb/content/sha256/d404c11f391c3588ad665fa9ad3f779eb56efc1abbed3cf309b834c824d3c93f
image/overlay2/imagedb/content/sha256/dc3313d83519292279466fb5ee7913350d49b8d82f85d537b713ca83d75049e7
image/overlay2/imagedb/content/sha256/dda2981c2844dd1c4a5e004d8bc14633b445f61d23312abba8468251389ed0bc
image/overlay2/imagedb/content/sha256/e865d00f6e1e56e7efcfcaf111c52064fc732e68de3eace195492ebf66c7bc74
image/overlay2/imagedb/content/sha256/ea697b65eff199541ec38bbf6ee28085463f0679c9aec3867834f0c14d29d6f4
That is a list of image ids that include the same layer. If I want to map those ids back into names, I would need to consult image/overlay2/repositories.json
, which maps names to layers, or I would need to parse the output of docker images
. Maybe something like:
grep -rl sha256:c854e44a1a5a22c9344c90f68ef6e07fd479e8ce1030fe141e3236887f1a4920 image/overlay2/imagedb/ |
while read path; do
id=${path##*/}
docker images --no-trunc | grep $id | awk '{print $1, $3}'
done
Which on my system will output:
larsks/mavproxy sha256:0af8d29ecea9dc870ba0a7740d9f23a55aad8d9edacf4f89f6d6b239b58c7829
larsks/apmplanner sha256:5e715eb065698db5444af5ff341d30007d0b67507885f8aab89701ec2c4731fe
larsks/qgroundcontrol sha256:2a6265c23c52d1842ac38ea78fde670910dd40d15a8f0f62f60646ad9b209542
sitl sha256:7420866bd587f7b76fbd23b1c15d0a2b9ca5a04fd2d6e442c62a6b25a195b378
cmd/mavproxy sha256:d00e9707a3d8b1cae319ec88b4ccb26f111bb979ec1978cd32147274ab1704e4
cmd/apmplanner sha256:d17cae44602ad335f518276dfcc8a27a251b619f3f9037c55c278eb49d83d74b
cmd/qgroundcontrol sha256:d404c11f391c3588ad665fa9ad3f779eb56efc1abbed3cf309b834c824d3c93f
mavproxy sha256:dda2981c2844dd1c4a5e004d8bc14633b445f61d23312abba8468251389ed0bc
ubuntu sha256:f753707788c5c100f194ce0a73058faae1a457774efcda6c1469544a114f8644
...which seems reasonable.
Based on the inspiration given by larsks in the answer, I managed to find the location of the layers.
For example, suppose we want to find the location of the layer contributed by the COPY
step, which corresponds to an intermediate image with id 8fd47fed98d6
, we can inspect it first.
root@ruifeng-VirtualBox:/var/lib/docker# docker inspect 8fd47fed98d6 | jq -r '.[].RootFS'
{
"Layers": [
"sha256:a85f35566a268e6f4411c5157ffcffe4f5918b068b04d79fdd80003901ca39da",
"sha256:eaaf7298332642da0f8190fa4b96ad46c04b9c1d1682bc3a35d77bded2b1e0a9",
"sha256:33a212e8aa5642d3a2ddead146e85912407fc5bbb2a896dab11fcf329177a999",
"sha256:f1f25d8c6e56dc4891df147a77f57e756873b57f33ce95e6a0acbe47117c0c8a",
"sha256:67852b7d2cf5f0885293fa9df91ebfd8ef0c42ba11a5155f94806f3a96c5e916",
"sha256:480d48b7e2864a44c1b2fca0c7e32fbab505f7526ccb25bbfed191c04a9bb7b0",
"sha256:18d270fe64aa423e0ffdf24faf0103432027da3d5c12f4505e7daedad9fe2195",
"sha256:a73c3f5eb83790bc6d03381a43a20aef7d0d9d97de0cff4b040e8e4c01a3aee5",
"sha256:e8d1b67ace73cb92cc00725354e84024153bedae4280149c03fcb52f34d83757",
"sha256:19a4b80afc677825fec94adf8b6a45a866f42a38675f87f86e50171ff5e0a280"
],
"Type": "layers"
}
Now we try to look for the last layer.
root@ruifeng-VirtualBox:/var/lib/docker# find . -name '*19a4b80afc677825fec94adf8b6a45a866f42a38675f87f86e50171ff5e0a280*'
root@ruifeng-VirtualBox:/var/lib/docker#
But there is nothing on the disk. Perhaps there is some reference tree going on there. We can check the file contents in the layerdb.
root@ruifeng-VirtualBox:/var/lib/docker# grep -rl 19a4b80afc677825fec94adf8b6a45a866f42a38675f87f86e50171ff5e0a280 image/aufs/layerdb/
image/aufs/layerdb/sha256/f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449/diff
We can see that this layer is actually a diff
of f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449
. Let's find it.
root@ruifeng-VirtualBox:/var/lib/docker# find . -name '*f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449*'
./image/aufs/layerdb/sha256/f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449
And find the cache-id
that will direct us into the actual location in the aufs/diff
folder.
root@ruifeng-VirtualBox:/var/lib/docker# cat image/aufs/layerdb/sha256/f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449/cache-id
c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637
Let's go into the location and check.
root@ruifeng-VirtualBox:/var/lib/docker# cd aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637
root@ruifeng-VirtualBox:/var/lib/docker/aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637# find .
.
./home
./home/martian
./home/martian/water_tanks
./home/martian/water_tanks/IMG_0052.JPG
root@ruifeng-VirtualBox:/var/lib/docker/aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637#
It contains all files and directories that were intended to be copied into the image by the COPY
step. The size of the layer can be checked as well.
root@ruifeng-VirtualBox:/var/lib/docker# du -sh aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637
6.1M aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637
This will provide quite some insight into the Union File System and the Copy-on-Write mechanism used by Docker, if subsequent layers are also inspected in the same manner.
This can also be done in a reverse order. We can look for a file or directory that is intended to be inside the image, which should be somewhere inside aufs/diff
, and then use the cache-id
to trace back to the layers.
root@ruifeng-VirtualBox:/var/lib/docker# find . -name '*water_tanks*'
./aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637/home/martian/water_tanks
If this is a vfs this can be done with this one-liner:
ls -l /var/lib/docker/vfs/dir/$(cat $(grep $LAYER /var/lib/docker/image/vfs/layerdb/sha256/ -rl | sed s/diff/cache-id/g ))
where $LAYER
is the sha256 that you get out of docker inspect IMAGE_ID
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With