I am building a WebRTC application in which users can share their camera and their screen. When a client receives a stream/track, it needs to know whether it is a camera stream or a screen recording stream. This distinction is obvious at the sending end, but the distinction is lost by the time the tracks reach the receiving peer.
Here's some sample code from my application:
// Note the distinction between streams is obvious at the sending end.
const localWebcamStream = await navigator.mediaDevices.getUserMedia({ ... });
const screenCaptureStream = await navigator.mediaDevices.getDisplayMedia({ ... });
// This is called by signalling logic
function addLocalTracksToPeerConn(peerConn) {
  // Our approach here loses information because our two distinct streams 
  // are added to the PeerConnection's homogeneous bag of streams
  for (const track of screenCaptureStream.getTracks()) {
    peerConn.addTrack(track, screenCaptureStream);
  }
  for (const track of localWebcamStream.getTracks()) {
    peerConn.addTrack(track, localWebcamStream);
  }
}
// This is called by signalling logic
function handleRemoteTracksFromPeerConn(peerConn) {
    peerConn.ontrack = ev => {
      const stream = ev.streams[0];
      if (stream is a camera stream) {  // FIXME how to distinguish reliably?
        remoteWebcamVideoEl.srcObject = stream;
      }
      else if (stream is a screen capture) {  // FIXME how to distinguish reliably?
        remoteScreenCaptureVideoEl.srcObject = stream;
      }
  };
}
My ideal imaginary API would allow adding a .label to a track or stream, like this:
// On sending end, add arbitrary metadata
track.label = "screenCapture";
peerConn.addTrack(track, screenCaptureStream);
// On receiving end, retrieve arbitrary metadata
peerConn.ontrack = ev => {
      const trackType = ev.track.label;  // get the label when receiving the track
}
But this API does not really exist.
There is a MediaStreamTrack.label property,
but it's read-only, and not preserved in transmission.
By experimentation,
the .label property at the sending end is informative (e.g. label: "FaceTime HD Camera (Built-in) (05ac:8514)").
But at the receiving end, the .label for the same track is is not preserved.
(It appears to be replaced with the .id of the track - in Chrome, at least.)
This article by Kevin Moreland describes the same problem, and recommends a mildly terrifying solution: munge the SDP on the sending end, and then grep the SDP on the receiving end. But this solution feels very fragile and low-level.
I know there is a MediaStreamTrack.id property.
There is also a MediaStream.id property.
Both of these appear to be preserved in transmission.
This means I could send the metadata in a side-channel,
such as the signalling channel or a DataChannel.
From the sending end, I would send { "myStreams": { "screen": "<some stream id>", "camera": "<another stream id>" } }.
The receiving end would wait until it has both the metadata and the stream before displaying anything.
However, this approach introduces a side-channel (and inevitable concurrency challenges associated with that),
where a side-channel feels unnecessary.
I'm looking for an idiomatic, robust solution. How do I label/identify MediaStreams at the sending end, so that the receiving end knows which stream is which?
I ended up sending this metadata in the signaling channel. Each signaling message that contained a SessionDescription (SDP) now also contains metadata object alongside it, which annotates the MediaStreams that are described in the SDP. This has no concurrency issues, because clients will always receive the SDP+metadata for a MediaStream before the track event is fired for that MediaStream.
So previously I had signaling messages like this:
{
  "kind": "sessionDescription",
  // An RTCSessionDescriptionInit
  "sessionDescription": { "type": "offer", "sdp": "..." }
}
Now I have signaling messages like this:
{
  "kind": "sessionDescription",
  // An RTCSessionDescriptionInit
  "sessionDescription": { "type": "offer", "sdp": "..." },
  // A map from MediaStream IDs to arbitrary domain-specific metadata
  "mediaStreamMetadata": {
    "y6w4u6e57654at3s5y43at4y5s46": { "type": "camera" },
    "ki8a3greu6e53a4s46uu7dtdjtyt": { "type": "screen" }
  }
}
A more canonical approach to signalling a custom stream label with metadata would be to modify the SDP prior to sending (but after setLocalDescription) and modify the msid attribute (which stands for media stream id, see the specification).
The advantage here is that on the remote end the media stream id attribute is parsed and visible in the stream of the ontrack event. See this fiddle
Note that you can not make any assumptions about the track id. In Firefox, the track id in the SDP does not even match the track id on the sender side.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With