Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does cloud-init work?

Tags:

cloud-init

cloud-init is package performing various configurations on a virtual machine on first boot. You have to configure a file with your config, and throw it at your VM then you virtualize it.

But how exactly does it work ? How is the user data sent to the VM, and how cloud-init manages to execute the configurations ?

Thank you.

like image 820
Nakrule Avatar asked Jun 25 '26 11:06

Nakrule


1 Answers

Disclaimer: cloud-init is very complex, and there are lots of supported cloud vendors, and it's used in lots of different ways, but I think this is a fairly accurate simplified overview.

Couple of minor corrections first: cloud-init can run on any machine, not just a VM, and it can run on any boot, not just 'first boot'. It's basically just a way to run scripts during boot. Current Ubuntu server images, for example, come with cloud-init pre-installed, and it runs during boot, even on your desktop.

However, the main use case is first boot of "cloud images". The problem here is that cloud vendors want to ship an official distro release which just works, without the end-user having to actually carry out an installation, or the cloud vendor having to modify the distro in some way. cloud-init handles this by retrieving configuration data at various points during the boot process. In practice, this tends to be user names, passwords, ssh keys, locales, hostnames, additional repos, and so on. In other words, the sort of stuff you would have manually typed in during an installation, but normally without the network setup.

cloud-init can frequently determine exactly what it is running on during boot, by querying the DMI/SMBIOS, or a specific file such as /proc/1/environ. In these cases, it has built-in knowledge of where to find the required configuration data. In general, however, the data will come from the network or, failing that, a filesystem that is bundled with the image.

Many (most? all?) cloud vendors run a private webserver for the image, which is set up for dhcp on eth0 (the image can instead retrieve the required network configuration from another data source, but I think it's much more common just to use dhcp, which is the fallback position). The webserver responds to requests from cloud-init for the user, vendor, and instance data. If you've installed a VM at a cloud provider you'll have seen a user-data block that you can fill in - this is returned to cloud-init as the user data.

The docs have a simple tutorial which does exactly this: it uses QEMU to run an image, and the qemu-system-x86_64 command line sets the image smbios info to specify where the Python webserver is (10.0.2.2:8000). In practice, most cloud vendors serve private data from 169.254.169.254. This is the 'Instance Metadata Service' (IMDS).

There are various other ways to get the data, in addition to or instead of IMDS: a disk partition labelled config-2, for example, which attaches to the instance when it boots, or the kernel command line, or specific files in the filesystem.

Note that cloud-init fits a very specific niche, where a vendor has to provide a standard image to an end-user, with some customisation. You can run custom images at a cloud vendor without cloud-init, but some vendors won't let you install custom images, for reasons best known to themselves.

like image 112
EML Avatar answered Jun 29 '26 01:06

EML



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!