I was in a meeting on Google Meet and saw that you could turn on real time subtitles. They've actually got a demo here on how realtime speech to text can be done, so that bit doesn't confuse me.
I had also been wanting to experiment with WebRTC (which I believe GoogleMeet uses) just to see its capabilities - e.g. the ability to share a screen without any additional screens.
However, I've always been under the impression that a WebRTC video/audio stream is client peer-to-peer. The questions I have therefore are
How do they achieve this - and if they don't use WebRTC, is it possible to achieve this with WebRTC?
To date, Google Meet (or Hangouts), is a massive application that makes use of WebRTC.
Google Meet uses “peer to peer” connections for calls with 2 participants. Note: Peer-to-peer connections are used only for calls with two participants. If any additional participant joins, Hangouts will immediately switch to sending and receiving data using the connection to the Google server.
Utilizes a user's camera and microphone to capture and stream audio and video. Using this API allows you to get access to input devices such as the microphone and the web camera. When a developer integrates WebRTC into their website, they can create constraints on how they want the audio and video streamed.
WebRTC is a peer-peer communication protocol.
Google Meet is using WebRTC. The "peer" in that case is a server, not a browser. While six years old and some details have changed, much of this old article is still true. From the server Google can do audio processing.
This video describes the architecture required for speech-to-text (and actually translation + text-to-speech again).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With