Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set encoding for a nodeJS Transform stream in a safe manner

According to nodeJS docs(v5.10.0) for a Readable stream:

it is better to use readable.setEncoding('utf8') than working with buffers directly using buf.toString(encoding). This is because "multi-byte characters (...) would otherwise be potentially mangled. If you want to read the data as strings, always use this method.

My question is about how to implement this using the new API for Transform streams. There is no need now to go through the inheritance verbose method.

So, for example this would work as a way to transform stdin into an upper case string

const transform = require("stream").Transform({
  transform: function(chunk, encoding, next) {
    this.push(chunk.toString().toUpperCase());
    next();
  }
});

process.stdin.pipe(transform).pipe(process.stdout);

However, this would appear to go against the recommendation of not using toString() on buffers. I've tried modifying the Transform instance by setting encoding to "utf-8" like this:

const transform = require("stream").Transform({
  transform: function(chunk, encoding, next) {
    this.push(chunk.toUpperCase()); //chunk is a buffer so this doesn't work
    next();
  }
});
transform.setEncoding("utf-8");

process.stdin.pipe(transform).pipe(process.stdout);

Upon inspection, transform in the first case has an encoding of null, whereas in the second it has indeed changed to "utf-8". Yet, the chunk passed to the transform function is still a buffer. I thought that by setting encoding toString() method could be skipped but this is not the case.

I've also tried extending the read method as in the Readable and Duplex examples, but that is not allowed.

Is there a way to get rid of toString()?

like image 237
cortopy Avatar asked Apr 02 '16 12:04

cortopy


People also ask

How do you switch between modes in readable stream mode?

All readable streams start in the paused mode by default. One of the ways of switching the mode of a stream to flowing is to attach a 'data' event listener. A way to switch the readable stream to a flowing mode manually is to call the stream. resume method.

What is a ReadableStream?

A readable stream is a data source represented in JavaScript by a ReadableStream object that flows from an underlying source. The ReadableStream() constructor creates and returns a readable stream object from the given handlers.

What is stream in Node.js in what cases stream should be used?

A stream is an abstract interface for working with streaming data in Node.js. The node:stream module provides an API for implementing the stream interface. There are many stream objects provided by Node.js. For instance, a request to an HTTP server and process.stdout are both stream instances.

Are node streams asynchronous?

Streams are a built-in Node. js language feature that represent an asynchronous flow of data, and are a way to handle reading/writing files.


2 Answers

You are right. Using Buffer#toString directly in your _transform method is bad. However, setEncoding is meant to be used by readable stream consumers (i.e. the code that reads from your transform stream). You are implementing a transform stream. It doesn't change the input of your _transform method for you.

Internally, readable streams use the StringDecoder if the consumer activated auto-decoding. You can use it in your transform method as well.

Here's a code comment explaining how it works:

[StringDecoder] decodes the given buffer and returns it as JS string that is guaranteed to not contain any partial multi-byte characters. Any partial character found at the end of the buffer is buffered up, and will be returned when calling write again with the remaining bytes.

So, your example could be rewritten as follows:

var StringDecoder = require('string_decoder').StringDecoder
const transform = require("stream").Transform({
  transform: function(chunk, encoding, next) {
    if(!this.myStringDecoder) this.myStringDecoder = new StringDecoder('utf8')
    this.push(this.myStringDecoder.write().toUpperCase());
    next();
  }
});

process.stdin.pipe(transform).pipe(process.stdout);
like image 84
Marcel Klehr Avatar answered Oct 10 '22 12:10

Marcel Klehr


Pass 'decodeStrings: false' as 'options' property to Transform's constructor:

const transform = require("stream").Transform({
    transform: function(chunk, encoding, next) {
        this.push(chunk.toUpperCase()); //chunk is a buffer so this doesn't work
        next();
    },
    decodeStrings: false
});
like image 40
Deepak Pathak Avatar answered Oct 10 '22 13:10

Deepak Pathak