I am trying to build an application like twitch (i.e. many-to-many real-time video streams). I want to use WebRTC because I want to make the app accessible from all platforms (I am planning to either go with Nativescript or the PWA road). My plan is to stream the camera from person A to the media server. Transcode the WebRTC stream in multiple qualities, etc. and send it to all of the subscribed users, who are also able to play WebRTC streams. In the ideal case, there will be thousands of streamers, each with thousands of real-time subscribers.
How can this be done, however? I need some kind of media server which will be responsible to take the stream, transcode it and forward it. An MVP would be to just forward the stream, without transcoding it, however, it should be possible to add that optimization in the future.
Should I go for something like Kurento, Jitsi, etc? Or is it feasible that I can build this server myself?
Is this architecture even a good idea, or should I rethink everything? The reason I am not going for RTMP or something similar is because of the amount of code and work that has to be put into developing the different client codes for native apps (iOS, Android, any ol' browser). If I can use WebRTC, this will make the client code much easier and will make the app accessible on all platforms.
Thanks a lot in advance!
An ambitious project, it will be complicated
First of all: A media server is a good (even a necessary) choice, if you plan to write a large scale application. And either transcribing or making the sender send multiple video stream qualities will improve your users experience.
Now to the media servers: You are only interested in forwarding media, ergo sfu based or similar servers. Jitsi and Kurento both can do this (but Kurento is more often associated as a mixing server). I cannot give you advise on which of them to use since I have not enough experience with them.
The SFU approach scales well and may be enough for your application but for a large scale service like twitch, you may also have a look at CDN supported technologies. Classically, this is DASH or HLS. Both will increase latency, but since you do not have two or multiple users speaking with each other in real time, but only one person broadcasting, this may be tolerable.
I have looked into this problem myself and found something related here which may help you. The basic idea is sending the video to the server via webrtc (low latency but not necessarily scaling well when you reach the sfu's limits) and then encoding the video for DASH (or more optimized CMAF). Then, your content can be served by CDNs, which may allow you to scale up the service.
But why not have a look at what twitch says about itself? Here's an article where they list their services and talk briefly about how they work: twitch engineering - an introduction and overview
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With