youtube-dl can be used to see what formats are used to store YouTube content:
youtube-dl -F https://youtu.be/??????
The above command hints that the audio and video are mostly stored separately. Is it right? Does YouTube streaming combine audio and video in real-time?

Most large streaming services will use ABR streaming (see: https://stackoverflow.com/a/42365034/334402).
The two most common ABR streaming formats are HLS and MPEG-DASH and both provide a manifest or index file which the player downloads first and which will contain links to the media streams, typically audio, video, subtitle tracks etc.
For encrypted content the audio and video, and even different bit rate video tracks, may all have separate encryption keys.
The player will download the audio and video tracks and synchronise them for playback.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With