- Desired Behaviour - Actual Behaviour - What I've Tried - Steps To Reproduce - Research <hr> Desired Behaviour Pipe multiple readable streams, received from multiple api requests, to a single writeable stream. The api responses are from ibm-watson's textToSpeech.synthesize() method. The reason multiple requests are required is because the service has a <code>5KB</code> limit on text input. Therefore a string of <code>18KB</code>, for example, requires four requests to complete. Actual Behaviour The writeable stream file is incomplete and garbled. The application seems to 'hang'. When I try and open the incomplete <code>.mp3</code> file in an audio player, it says it is corrupted. The process of opening and closing the file seems to increase its file size - like opening the file somehow prompts more data to flow in to it. Undesirable behaviour is more apparent with larger inputs, eg four strings of 4000 bytes or less. What I've Tried I've tried several methods to pipe the readable streams to either a single writeable stream or multiple writeable streams using the npm packages combined-stream, combined-stream2, multistream and archiver and they all result in incomplete files. My last attempt doesn't use any packages and is shown in the <code>Steps To Reproduce</code> section below. I am therefore questioning each part of my application logic: <blockquote> 01. What is the response type of a watson text to speech api request? </blockquote> The text to speech docs, say the api response type is: <pre class="prettyprint"><code>Response type: NodeJS.ReadableStream|FileObject|Buffer </code></pre> I am confused that the response type is one of three possible things. In all my attempts, I have been assuming it is a <code>readable stream</code>. <blockquote> 02. Can I make multiple api requests in a map function? 03. Can I wrap each request within a <code>promise()</code> and resolve the <code>response</code>? 04. Can I assign the resulting array to a <code>promises</code> variable? 05. Can I declare <code>var audio_files = await Promise.all(promises)</code>? 06. After this declaration, are all responses 'finished'? 07. How do I correctly pipe each response to a writable stream? 08. How do I detect when all pipes have finished, so I can send file back to client? </blockquote> For questions 2 - 6, I am assuming the answer is 'YES'. I think my failures relate to question 7 and 8. Steps To Reproduce You can test this code with an array of four randomly generated text strings with a respective byte size of <code>3975</code>, <code>3863</code>, <code>3974</code> and <code>3629</code> bytes - here is a pastebin of that array. <pre class="prettyprint lang-js prettyprint-override"><code>// route handler app.route("/api/:api_version/tts") .get(api_tts_get); // route handler middleware const api_tts_get = async (req, res) => { var query_parameters = req.query; var file_name = query_parameters.file_name; var text_string_array = text_string_array; // eg: https://pastebin.com/raw/JkK8ehwV var absolute_path = path.join(__dirname, "/src/temp_audio/", file_name); var relative_path = path.join("./src/temp_audio/", file_name); // path relative to server root // for each string in an array, send it to the watson api var promises = text_string_array.map(text_string => { return new Promise((resolve, reject) => { // credentials var textToSpeech = new TextToSpeechV1({ iam_apikey: iam_apikey, url: tts_service_url }); // params var synthesizeParams = { text: text_string, accept: 'audio/mp3', voice: 'en-US_AllisonV3Voice' }; // make request textToSpeech.synthesize(synthesizeParams, (err, audio) => { if (err) { console.log("synthesize - an error occurred: "); return reject(err); } resolve(audio); }); }); }); try { // wait for all responses var audio_files = await Promise.all(promises); var audio_files_length = audio_files.length; var write_stream = fs.createWriteStream(`${relative_path}.mp3`); audio_files.forEach((audio, index) => { // if this is the last value in the array, // pipe it to write_stream, // when finished, the readable stream will emit 'end' // then the .end() method will be called on write_stream // which will trigger the 'finished' event on the write_stream if (index == audio_files_length - 1) { audio.pipe(write_stream); } // if not the last value in the array, // pipe to write_stream and leave open else { audio.pipe(write_stream, { end: false }); } }); write_stream.on('finish', function() { // download the file (using absolute_path) res.download(`${absolute_path}.mp3`, (err) => { if (err) { console.log(err); } // delete the file (using relative_path) fs.unlink(`${relative_path}.mp3`, (err) => { if (err) { console.log(err); } }); }); }); } catch (err) { console.log("there was an error getting tts"); console.log(err); } } </code></pre> The official example shows: <pre class="prettyprint lang-js prettyprint-override"><code>textToSpeech.synthesize(synthesizeParams) .then(audio => { audio.pipe(fs.createWriteStream('hello_world.mp3')); }) .catch(err => { console.log('error:', err); }); </code></pre> which seems to work fine for single requests, but not for multiple requests, as far as I can tell. Research concerning readable and writeable streams, readable stream modes (flowing and paused), 'data', 'end', 'drain' and 'finish' events, pipe(), fs.createReadStream() and fs.createWriteStream() <hr> <blockquote> Almost all Node.js applications, no matter how simple, use streams in some manner... </blockquote> <pre class="prettyprint lang-js prettyprint-override"><code>const server = http.createServer((req, res) => { // `req` is an http.IncomingMessage, which is a Readable Stream // `res` is an http.ServerResponse, which is a Writable Stream let body = ''; // get the data as utf8 strings. // if an encoding is not set, Buffer objects will be received. req.setEncoding('utf8'); // readable streams emit 'data' events once a listener is added req.on('data', (chunk) => { body += chunk; }); // the 'end' event indicates that the entire body has been received req.on('end', () => { try { const data = JSON.parse(body); // write back something interesting to the user: res.write(typeof data); res.end(); } catch (er) { // uh oh! bad json! res.statusCode = 400; return res.end(`error: ${er.message}`); } }); }); </code></pre> https://nodejs.org/api/stream.html#stream_api_for_stream_consumers <hr> <blockquote> Readable streams have two main modes that affect the way we can consume them...they can be either in the <code>paused</code> mode or in the <code>flowing</code> mode. All readable streams start in the paused mode by default but they can be easily switched to <code>flowing</code> and back to <code>paused</code> when needed...just adding a <code>data</code> event handler switches a paused stream into <code>flowing</code> mode and removing the <code>data</code> event handler switches the stream back to <code>paused</code> mode. </blockquote> https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93 <hr> <blockquote> Here’s a list of the important events and functions that can be used with readable and writable streams </blockquote> <img src="https://i.stack.imgur.com/pbW2K.png" alt="enter image description here"> <blockquote> The most important events on a readable stream are: The <code>data</code> event, which is emitted whenever the stream passes a chunk of data to the consumer The <code>end</code> event, which is emitted when there is no more data to be consumed from the stream. The most important events on a writable stream are: The <code>drain</code> event, which is a signal that the writable stream can receive more data. The <code>finish</code> event, which is emitted when all data has been flushed to the underlying system. </blockquote> https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93 <hr> <blockquote> <code>.pipe()</code> takes care of listening for 'data' and 'end' events from the <code>fs.createReadStream()</code>. </blockquote> https://github.com/substack/stream-handbook#why-you-should-use-streams <hr> <blockquote> <code>.pipe()</code> is just a function that takes a readable source stream src and hooks the output to a destination writable stream <code>dst</code> </blockquote> https://github.com/substack/stream-handbook#pipe <hr> <blockquote> The return value of the <code>pipe()</code> method is the destination stream </blockquote> https://flaviocopes.com/nodejs-streams/#pipe <hr> <blockquote> By default, stream.end() is called on the destination <code>Writable</code> stream when the source <code>Readable</code> stream emits <code>'end'</code>, so that the destination is no longer writable. To disable this default behavior, the <code>end</code> option can be passed as <code>false</code>, causing the destination stream to remain open: </blockquote> https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options <hr> <blockquote> The <code>'finish'</code> event is emitted after the <code>stream.end()</code> method has been called, and all data has been flushed to the underlying system. </blockquote> <pre class="prettyprint lang-js prettyprint-override"><code>const writer = getWritableStreamSomehow(); for (let i = 0; i < 100; i++) { writer.write(`hello, #${i}!\n`); } writer.end('This is the end\n'); writer.on('finish', () => { console.log('All writes are now complete.'); }); </code></pre> https://nodejs.org/api/stream.html#stream_event_finish <hr> <blockquote> If you're trying to read multiple files and pipe them to a writable stream, you have to pipe each one to the writable stream and and pass <code>end: false</code> when doing it, because by default, a readable stream ends the writable stream when there's no more data to be read. Here's an example: </blockquote> <pre class="prettyprint lang-js prettyprint-override"><code>var ws = fs.createWriteStream('output.pdf'); fs.createReadStream('pdf-sample1.pdf').pipe(ws, { end: false }); fs.createReadStream('pdf-sample2.pdf').pipe(ws, { end: false }); fs.createReadStream('pdf-sample3.pdf').pipe(ws); </code></pre> https://stackoverflow.com/a/30916248 <hr> <blockquote> You want to add the second read into an eventlistener for the first read to finish... </blockquote> <pre class="prettyprint lang-js prettyprint-override"><code>var a = fs.createReadStream('a'); var b = fs.createReadStream('b'); var c = fs.createWriteStream('c'); a.pipe(c, {end:false}); a.on('end', function() { b.pipe(c) } </code></pre> https://stackoverflow.com/a/28033554 <hr> A Brief History of Node Streams - part one and two. <hr> Related Google search: <blockquote> how to pipe multiple readable streams to a single writable stream? nodejs </blockquote> Questions covering the same or similar topic, without authoritative answers (or might be 'outdated'): How to pipe multiple ReadableStreams to a single WriteStream? Piping to same Writable stream twice via different Readable stream Pipe multiple files to one response Creating a Node.js stream from two piped streams

The core problem to solve here is asynchronicity. You almost had it: the problem with the code you posted is that you are piping all source streams in parallel & unordered into the target stream. This means <code>data</code> chunks will flow randomly from different audio streams - even your <code>end</code> event will outrace the <code>pipe</code>s without <code>end</code> closing the target stream too early, which might explain why it increases after you re-open it. What you want is to pipe them sequentially - you even posted the solution when you quoted <blockquote> You want to add the second read into an eventlistener for the first read to finish... </blockquote> or as code: <pre class="prettyprint"><code>a.pipe(c, { end:false }); a.on('end', function() { b.pipe(c); } </code></pre> This will pipe the source streams in sequential order into the target stream. Taking your code this would mean to replace the <code>audio_files.forEach</code> loop with: <pre class="prettyprint"><code>await Bluebird.mapSeries(audio_files, async (audio, index) => { const isLastIndex = index == audio_files_length - 1; audio.pipe(write_stream, { end: isLastIndex }); return new Promise(resolve => audio.on('end', resolve)); }); </code></pre> Note the usage of bluebird.js mapSeries here. Further advice regarding your code: <ul> <li>you should consider using lodash.js </li> <li>you should use <code>const</code> & <code>let</code> instead of <code>var</code> and consider using <code>camelCase</code> </li> <li>when you notice "it works with one event, but fails with multiple" always think: asynchronicity, permutations, race conditions.</li> </ul> Further reading, limitations of combining native node streams: https://github.com/nodejs/node/issues/93

How to pipe multiple readable streams, from multiple api requests, to a single writeable stream?

Tags:

node.js

express

fs

node-streams

ibm-watson

- Desired Behaviour
- Actual Behaviour
- What I've Tried
- Steps To Reproduce
- Research

Desired Behaviour

Pipe multiple readable streams, received from multiple api requests, to a single writeable stream.

The api responses are from ibm-watson's textToSpeech.synthesize() method.

The reason multiple requests are required is because the service has a 5KB limit on text input.

Therefore a string of 18KB, for example, requires four requests to complete.

Actual Behaviour

The writeable stream file is incomplete and garbled.

The application seems to 'hang'.

When I try and open the incomplete .mp3 file in an audio player, it says it is corrupted.

The process of opening and closing the file seems to increase its file size - like opening the file somehow prompts more data to flow in to it.

Undesirable behaviour is more apparent with larger inputs, eg four strings of 4000 bytes or less.

What I've Tried

I've tried several methods to pipe the readable streams to either a single writeable stream or multiple writeable streams using the npm packages combined-stream, combined-stream2, multistream and archiver and they all result in incomplete files. My last attempt doesn't use any packages and is shown in the Steps To Reproduce section below.

I am therefore questioning each part of my application logic:

01. What is the response type of a watson text to speech api request?

The text to speech docs, say the api response type is:

Response type: NodeJS.ReadableStream|FileObject|Buffer

I am confused that the response type is one of three possible things.

In all my attempts, I have been assuming it is a readable stream.

02. Can I make multiple api requests in a map function?

03. Can I wrap each request within a promise() and resolve the response?

04. Can I assign the resulting array to a promises variable?

05. Can I declare var audio_files = await Promise.all(promises)?

06. After this declaration, are all responses 'finished'?

07. How do I correctly pipe each response to a writable stream?

08. How do I detect when all pipes have finished, so I can send file back to client?

For questions 2 - 6, I am assuming the answer is 'YES'.

I think my failures relate to question 7 and 8.

Steps To Reproduce

You can test this code with an array of four randomly generated text strings with a respective byte size of 3975, 3863, 3974 and 3629 bytes - here is a pastebin of that array.

// route handler
app.route("/api/:api_version/tts")
    .get(api_tts_get);

// route handler middleware
const api_tts_get = async (req, res) => {

    var query_parameters = req.query;

    var file_name = query_parameters.file_name;
    var text_string_array = text_string_array; // eg: https://pastebin.com/raw/JkK8ehwV

    var absolute_path = path.join(__dirname, "/src/temp_audio/", file_name);
    var relative_path = path.join("./src/temp_audio/", file_name); // path relative to server root

    // for each string in an array, send it to the watson api  
    var promises = text_string_array.map(text_string => {

        return new Promise((resolve, reject) => {

            // credentials
            var textToSpeech = new TextToSpeechV1({
                iam_apikey: iam_apikey,
                url: tts_service_url
            });

            // params  
            var synthesizeParams = {
                text: text_string,
                accept: 'audio/mp3',
                voice: 'en-US_AllisonV3Voice'
            };

            // make request  
            textToSpeech.synthesize(synthesizeParams, (err, audio) => {
                if (err) {
                    console.log("synthesize - an error occurred: ");
                    return reject(err);
                }
                resolve(audio);
            });

        });
    });

    try {
        // wait for all responses
        var audio_files = await Promise.all(promises);
        var audio_files_length = audio_files.length;

        var write_stream = fs.createWriteStream(`${relative_path}.mp3`);

        audio_files.forEach((audio, index) => {

            // if this is the last value in the array, 
            // pipe it to write_stream, 
            // when finished, the readable stream will emit 'end' 
            // then the .end() method will be called on write_stream  
            // which will trigger the 'finished' event on the write_stream    
            if (index == audio_files_length - 1) {
                audio.pipe(write_stream);
            }
            // if not the last value in the array, 
            // pipe to write_stream and leave open 
            else {
                audio.pipe(write_stream, { end: false });
            }

        });

        write_stream.on('finish', function() {

            // download the file (using absolute_path)  
            res.download(`${absolute_path}.mp3`, (err) => {
                if (err) {
                    console.log(err);
                }
                // delete the file (using relative_path)  
                fs.unlink(`${relative_path}.mp3`, (err) => {
                    if (err) {
                        console.log(err);
                    }
                });
            });

        });


    } catch (err) {
        console.log("there was an error getting tts");
        console.log(err);
    }

}

The official example shows:

textToSpeech.synthesize(synthesizeParams)
  .then(audio => {
    audio.pipe(fs.createWriteStream('hello_world.mp3'));
  })
  .catch(err => {
    console.log('error:', err);
  });

which seems to work fine for single requests, but not for multiple requests, as far as I can tell.

Research

concerning readable and writeable streams, readable stream modes (flowing and paused), 'data', 'end', 'drain' and 'finish' events, pipe(), fs.createReadStream() and fs.createWriteStream()

Almost all Node.js applications, no matter how simple, use streams in some manner...

const server = http.createServer((req, res) => {
// `req` is an http.IncomingMessage, which is a Readable Stream
// `res` is an http.ServerResponse, which is a Writable Stream

let body = '';
// get the data as utf8 strings.
// if an encoding is not set, Buffer objects will be received.
req.setEncoding('utf8');

// readable streams emit 'data' events once a listener is added
req.on('data', (chunk) => {
body += chunk;
});

// the 'end' event indicates that the entire body has been received
req.on('end', () => {
try {
const data = JSON.parse(body);
// write back something interesting to the user:
res.write(typeof data);
res.end();
} catch (er) {
// uh oh! bad json!
res.statusCode = 400;
return res.end(`error: ${er.message}`);
}
});
});

https://nodejs.org/api/stream.html#stream_api_for_stream_consumers

Readable streams have two main modes that affect the way we can consume them...they can be either in the paused mode or in the flowing mode. All readable streams start in the paused mode by default but they can be easily switched to flowing and back to paused when needed...just adding a data event handler switches a paused stream into flowing mode and removing the data event handler switches the stream back to paused mode.

https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93

Here’s a list of the important events and functions that can be used with readable and writable streams

enter image description here

The most important events on a readable stream are:

The data event, which is emitted whenever the stream passes a chunk of data to the consumer The end event, which is emitted when there is no more data to be consumed from the stream.

The most important events on a writable stream are:

The drain event, which is a signal that the writable stream can receive more data. The finish event, which is emitted when all data has been flushed to the underlying system.

https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93

.pipe() takes care of listening for 'data' and 'end' events from the fs.createReadStream().

https://github.com/substack/stream-handbook#why-you-should-use-streams

.pipe() is just a function that takes a readable source stream src and hooks the output to a destination writable stream dst

https://github.com/substack/stream-handbook#pipe

The return value of the pipe() method is the destination stream

https://flaviocopes.com/nodejs-streams/#pipe

By default, stream.end() is called on the destination Writable stream when the source Readable stream emits 'end', so that the destination is no longer writable. To disable this default behavior, the end option can be passed as false, causing the destination stream to remain open:

https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options

The 'finish' event is emitted after the stream.end() method has been called, and all data has been flushed to the underlying system.

const writer = getWritableStreamSomehow();
for (let i = 0; i < 100; i++) {
  writer.write(`hello, #${i}!\n`);
}
writer.end('This is the end\n');
writer.on('finish', () => {
  console.log('All writes are now complete.');
});

https://nodejs.org/api/stream.html#stream_event_finish

If you're trying to read multiple files and pipe them to a writable stream, you have to pipe each one to the writable stream and and pass end: false when doing it, because by default, a readable stream ends the writable stream when there's no more data to be read. Here's an example:

var ws = fs.createWriteStream('output.pdf');

fs.createReadStream('pdf-sample1.pdf').pipe(ws, { end: false });
fs.createReadStream('pdf-sample2.pdf').pipe(ws, { end: false });
fs.createReadStream('pdf-sample3.pdf').pipe(ws);

https://stackoverflow.com/a/30916248

You want to add the second read into an eventlistener for the first read to finish...

var a = fs.createReadStream('a');
var b = fs.createReadStream('b');
var c = fs.createWriteStream('c');
a.pipe(c, {end:false});
a.on('end', function() {
  b.pipe(c)
}

https://stackoverflow.com/a/28033554

A Brief History of Node Streams - part one and two.

Related Google search:

how to pipe multiple readable streams to a single writable stream? nodejs

Questions covering the same or similar topic, without authoritative answers (or might be 'outdated'):

How to pipe multiple ReadableStreams to a single WriteStream?

Piping to same Writable stream twice via different Readable stream

Pipe multiple files to one response

Creating a Node.js stream from two piped streams

829

asked Jul 23 '19 06:07

user1063287

1 Answers

The core problem to solve here is asynchronicity. You almost had it: the problem with the code you posted is that you are piping all source streams in parallel & unordered into the target stream. This means data chunks will flow randomly from different audio streams - even your end event will outrace the pipes without end closing the target stream too early, which might explain why it increases after you re-open it.

What you want is to pipe them sequentially - you even posted the solution when you quoted

You want to add the second read into an eventlistener for the first read to finish...

or as code:

a.pipe(c, { end:false });
a.on('end', function() {
  b.pipe(c);
}

This will pipe the source streams in sequential order into the target stream.

Taking your code this would mean to replace the audio_files.forEach loop with:

await Bluebird.mapSeries(audio_files, async (audio, index) => {  
    const isLastIndex = index == audio_files_length - 1;
    audio.pipe(write_stream, { end: isLastIndex });
    return new Promise(resolve => audio.on('end', resolve));
});

Note the usage of bluebird.js mapSeries here.

Further advice regarding your code:

you should consider using lodash.js
you should use const & let instead of var and consider using camelCase
when you notice "it works with one event, but fails with multiple" always think: asynchronicity, permutations, race conditions.

Further reading, limitations of combining native node streams: https://github.com/nodejs/node/issues/93

152

answered Oct 23 '22 12:10

B M

Related questions
                            
                                How to compile all included files into one using Babel?
                            
                                Mock a namespace and a function with same name using Jest
                            
                                How do I get the actual server error when running supertest in mocha?
                            
                                Can't install/update heroku toolbelt - heroku-pipelines [closed]
                            
                                npm install from git repo subfolder
                            
                                Npm global not being used?
                            
                                nodejs - DataCloneError: function () { [native code] } could not be cloned
                            
                                package.json isn't installing dependencies when running npm install
                            
                                Date toLocaleDateString in node
                            
                                Change npm-debug.log location
                            
                                RPC with promises for node.js [closed]
                            
                                How to Pipe Response to a File in Co-Request module & NodeJs?
                            
                                Integrate an option to explorer context menu with Electron
                            
                                Can I add cookies to a webpack dev server proxy?
                            
                                Validate object against Mongoose schema without saving as a new document
                            
                                Does Node.js/Express/Passport have an authorization module like CanCan for Rails?
                            
                                NPM update all to latest version [duplicate]
                            
                                How do I monitor symlinked modules with Nodemon?
                            
                                GraphQL "Cannot return null for non-nullable" [duplicate]
                            
                                Typescript can't import files without extension

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With