I'm connecting to an external websocket api using the node ws library (node 10.8.0 on Ubuntu 16.04). I've got a listener which simply parses the json and passes it to the callback:
this.ws.on('message', (rawdata) => { let data = null; try { data = JSON.parse(rawdata); } catch (e) { console.log('Failed parsing the following string as json: ' + rawdata); return; } mycallback(data); });
I now receive errors in which the rawData
looks as follows (I formatted and removed irrelevant contents):
�~A { "id": 1, etc.. }�~� { "id": 2, etc..
I then wondered; what are these characters? Seeing the structure I initially thought that the first weird sign must be an opening bracket of an array ([
) and the second one a comma (,
) so that it creates an array of objects.
I then investigated the problem further by writing the rawdata
to a file whenever it encounters a JSON parsing error. In an hour or so it has saved about 1500 of these error files, meaning this happens a lot. I cat
ed a couple of these files in the terminal, of which I uploaded an example below:
A few things are interesting here:
I'm not very experience with websockets, but could it be that my websocket somehow receives a stream of messages that it concatenates together, with these weird signs as separators, and then randomly cuts off the last message? Maybe because I'm getting a constant very fast stream of messages?
Or could it be because of an error (or functionality) server side in that it combines those individual messages?
Does anybody know what's going on here? All tips are welcome!
[EDIT]
@bendataclear suggested to interpret it as utf8. So I did, and I pasted a screenshot of the results below. The first print is as it is, and the second one interpreted as utf8. To me this doesn't look like anything. I could of course convert to utf8, and then split by those characters. Although the last message is always cut off, this would at least make some of the messages readble. Other ideas still welcome though.
To intercept the messages, you will have to spy on the onmessage = fn and addEventListener("message", fn) calls. To be able to modify the onmessage we have to override the global WebSocket in the first place.
It's not possible for them to arrive in your application out of order. Anything can happen on the network, but TCP will only present you the bytes in the order they were sent.
WebSocket enables bidirectional, message-oriented streaming of text and binary data between client and server.
WebSocket was first referenced as TCPConnection in the HTML5 specification, as a placeholder for a TCP-based socket API. In June 2008, a series of discussions were led by Michael Carter that resulted in the first version of the protocol known as WebSocket.
My assumption is that you're working only with English/ASCII characters and something probably messed the stream. (NOTE:I am assuming), there are no special characters, if it's so, then I will suggest you pass the entire json string into this function:
function cleanString(input) { var output = ""; for (var i=0; i<input.length; i++) { if (input.charCodeAt(i) <= 127) { output += input.charAt(i); } } console.log(output); } //example cleanString("�~�")
You can make reference to How to remove invalid UTF-8 characters from a JavaScript string?
EDIT
From an article by Internet Engineering Task Force (IETF),
A common class of security problems arises when sending text data using the wrong encoding. This protocol specifies that messages with a Text data type (as opposed to Binary or other types) contain UTF-8- encoded data. Although the length is still indicated and applications implementing this protocol should use the length to determine where the frame actually ends, sending data in an improper
The "Payload data" is text data encoded as UTF-8. Note that a particular text frame might include a partial UTF-8 sequence; however, the whole message MUST contain valid UTF-8. Invalid UTF-8 in reassembled messages is handled as described in Handling Errors in UTF-8-Encoded Data, which states that When an endpoint is to interpret a byte stream as UTF-8 but finds that the byte stream is not, in fact, a valid UTF-8 stream, that endpoint MUST Fail the WebSocket Connection. This rule applies both during the opening handshake and during subsequent data exchange.
I really believe that you error (or functionality) is coming from the server side which combines your individual messages, so I will suggest come up with a logic of ensuring that all your characters MUST be converted from Unicode to ASCII by first encoding the characters as UTF-8. And you might also want to install npm install --save-optional utf-8-validate
to efficiently check if a message contains valid UTF-8 as required by the spec.
You might also want to pass in an if
condition to help you do some checks;
this.ws.on('message', (rawdata) => { if (message.type === 'utf8') { // accept only text }
I hope this gets to help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With