Voice chat between Node.js and browser (audio streams, VoIP)

Tags:

I have done voice chatting between two node.js servers before (see: tvoip), which works quite well, but now I would like to do it between a node.js server and a browser. How could this be done?
From node.js to node.js I simply used raw PCM streams over a TCP connection.
For the browser this is probably not going to be that easy, right? I mean the browser doesn't really offer a TCP API. It does offer a WebSocket API, but does it handle streams? Would I have to convert the streams and if so into what format and how? What protocol should I use? Are there any helpful libraries to accomplish this already? Is socket.io-stream a viable library to send these kinds of streams?

From what I understand the audio streams are in the PCM format on the browser. So it should be compatble with the streams I got in Node.js. Is that assumption correct?

I have managed to pipe the browser mic input to the browser speaker output like this:

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
</head>
<body>

<!-- alternative method that also works
<audio></audio>
<script>
navigator.mediaDevices.getUserMedia({ audio: true }).then(function(stream) {
    const audio = document.querySelector('audio')
    audio.srcObject = stream
    audio.onloadedmetadata = function(e) {
        audio.play()
    }
}).catch(console.error)
</script>
-->
<script>
    navigator.mediaDevices.getUserMedia({audio: true}).then(stream => {
        const aCtx = new AudioContext()
        const analyser = aCtx.createAnalyser()
        const microphone = aCtx.createMediaStreamSource(stream)
        microphone.connect(analyser)
        analyser.connect(aCtx.destination)
    }).catch(err => {
        console.error("Error getting audio stream from getUserMedia")
    })
</script>

</body>
</html>

As you can see I found two solutions. I will try to base the node<->browser voice chat on the second one.

For Node.js I came up with this code to pipe a node.js mic input to a node.js speaker output:

const mic = require('mic')
const Speaker = require('speaker')

const micInstance = mic({ // arecord -D hw:0,0 -f S16_LE -r 44100 -c 2
    device: 'hw:2,0',           //   -D hw:0,0
    encoding: 'signed-integer', //             -f S
    bitwidth: '16',             //                 16
    endian: 'little',           //                   _LE
    rate: '44100',              //                       -r 44100
    channels: '1',              //                                -c 2
    debug: true
})
const micInputStream = micInstance.getAudioStream()

const speakerInstance = new Speaker({ // | aplay -D plughw:CARD=0,DEV=0
    channels: 1,
    bitDepth: 16,
    sampleRate: 44100,
    signed: true,
    device: 'plughw:2,0' //'plughw:NVidia,7'
})
speakerInstance.on('open', ()=>{
    console.log("Speaker received stuff")
})

// Pipe the readable microphone stream to the writable speaker stream:
micInputStream.pipe(speakerInstance)

micInputStream.on('data', data => {
    //console.log("Recieved Input Stream: " + data.length)
})
micInputStream.on('error', err => {
    cosole.log("Error in Input Stream: " + err)
})
micInstance.start()

console.log('Started')

Finding the right device for mic and speaker can be a bit tricky if you are not familiar with ALSA under Linux. It is explained here in case you are unsure. I am not certain how it works on Windows and Mac OS with SoX.

I then came up with a small test application to connect the two ideas using socket.io-stream (a socket.io library that allows sending streams over a socket). And obviously, this is where I'm stuck at.

Basically, I try this on the node.js side:

const mic = require('mic')
const Speaker = require('speaker')
const SocketIO = require('socket.io')
const ss = require('socket.io-stream')

...

io.on('connection', socket => {
    let micInstance = mic(micConfig)
    let micInputStream = micInstance.getAudioStream()
    let speakerInstance = new Speaker(speakerConfig)

    ...

    ss(socket).on('client-connect', (stream, data) => { // stream: duplex stream
        stream.pipe(speakerInstance) //speakerInstance: writable stream
        micInputStream.pipe(stream) //micInputStream: readable stream
        micInstance.start()
    })
})

and this on the browser side:

const socket = io()
navigator.mediaDevices.getUserMedia({audio:true}).then(clientMicStream => { // Get microphone input
    // Create a duplex stream using the socket.io-stream library's ss.createStream() method and emit it it to the server
    const stream = ss.createStream() //stream: duplex stream
    ss(socket).emit('client-connect', stream)

    // Send microphone input to the server by piping it into the stream
    clientMicStream.pipe(stream) //clientMicStream: readable stream
    // Play audio received from the server through the stream
    const aCtx = new AudioContext()
    const analyser = aCtx.createAnalyser()
    const microphone = aCtx.createMediaStreamSource(stream)
    microphone.connect(analyser)
    analyser.connect(aCtx.destination)
}).catch(e => {
    console.error('Error capturing audio.')
    alert('Error capturing audio.')
})

The whole code can be viewed at: https://github.com/T-vK/node-browser-audio-stream-test
(The README.md contains instructions on how to set it up, if you want to test it.) The relevant code is in server.js (The setupStream() function contains the interesting code.) and client.html.

As you can see I'm trying to send the duplex stream over the connection and pipe the microphone inputs into the duplex stream and pipe the duplex stream to the speaker on each end (like I did it in tvoip). It does not work atm, though.

Edit:

I'm not sure if I get this right, but the "stream" that I get from getUserMedia() is a MediaStream and this media stream can have MediaStreamTracks (audio, video or both). I'm my case it would obviously just be one track (audio). But a MediaStreamTrack doesn't seem to be a stream as I know it from Node.js, meaning that it can't just be piped. So maybe it would have to be converted into one. I found this interesting library called microphone-stream which claims to be able to do it. But it doesn't seem to be available as a simple browser library. It seems to require wrapping your whole project with browserify. Which seems very overkill. I'd like to keep it simple.

989

asked May 30 '18 14:05

Forivin

1 Answers

There exists a standard for doing VoIP with browsers that is supported by all mayor browsers: WebRTC. Although being a dreadful beast of complexity, it is supported out of the box by all mayor browsers which hide its complexity. I am no javascript developer, but I highly assume that there exists gold support for it in the JS world, look at e.g. this blogpost.

If you do not want the full-featured overkill solution, I would draw back to RTP as a streaming protocol , which is kind of standard in VoIP and Opus for encoding. Both are well-established technologies and form kind of the default pair of VoIP streaming, RTP is leightweight, and Opus efficient in compressing while rtaining high audio quality. They ought to be well-supported in either the Browser and node.js environments.

Beware: If you decide to send plain PCM, precisely define all the parameters - frame length (8, 16, 32 bit), a signed/unsigned, integer/float and expecially endianness !

126

answered Oct 04 '22 14:10

Michael Beer

Related questions
                            
                                Defensive programming in node.js using asserts
                            
                                JavaScript Maze Solver Algorithm
                            
                                IE8 querySelector null vs normal null
                            
                                Is the order of onload handler and src set important in a script element?
                            
                                NodeJS harmony gives SyntaxError on import
                            
                                phantomjs fit content to A4 page
                            
                                My scripts are inlined by some mobile carriers - How to deal with that?
                            
                                Basic Google Sign-In for Websites code not working in Internet Explorer 11
                            
                                Polymer iron-form post questions
                            
                                What is the appropriate lifecycle stage to do redirects in a React component? [duplicate]
                            
                                Implementing undo / redo in Redux
                            
                                pointer-events:none but capture click
                            
                                Support for the webkitSpeechRecognition API in Opera
                            
                                Algorithm for reading image as lines (then get a result of them)?
                            
                                2D soft bodies: Gelly and moldable?
                            
                                Vue combine event handlers
                            
                                Most effective way to apply frequent CSS changes in Angular 2
                            
                                React shouldComponentUpdate() = false not stopping re-render
                            
                                Webpack gives eslint errors while using npm link
                            
                                Voice over is not reading what I am typing in textbox

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Voice chat between Node.js and browser (audio streams, VoIP)

Tags:

javascript

browser

node.js

audio

voip

Forivin

People also ask

1 Answers

Michael Beer

Recent Activity

Donate For Us