How to deal with non UTF-8 encoded urls in express

Tags:

We have a node js application which we have recently moved over from running on IIS 7 (via IIS node) to running on Linux (Elastic Beanstalk). Since we switched, we've been getting a lot of non UTF-8 urls being sent to our application (mainly from crawlers) such as:

Bj%F6rk which IIS was converting to Björk. This is now being passed to our application and our web framework (express) eventually calls down to

decodeURIComponent('Bj%F6rk'); URIError: URI malformed at decodeURIComponent (native) at repl:1:1 at REPLServer.self.eval (repl.js:110:21) at repl.js:249:20 at REPLServer.self.eval (repl.js:122:7) at Interface.<anonymous> (repl.js:239:12) at Interface.emit (events.js:95:17) at Interface._onLine (readline.js:203:10) at Interface._line (readline.js:532:8) at Interface._ttyWrite (readline.js:761:14)

Is there a recommended safe way we can perform the same conversion as IIS before sending the url string to express?

Bearing in mind

We are receiving requests to these badly encoded URLS and
There is a way to decode them using the deprecated unescape javascript function and
The majority of the requests to these URLs are coming from Bing Bot and we want to minimise any adverse effect on our search rankings.
- Should we really be doing this for all incoming URLs?
- Are there any security or performance implications we should be concerned about?
- Should we be concerned about unescape being removed in the near future?
- Is there a better / safer way to solve this problem (Yes we did read that MDN article linked to above)

987

asked Sep 18 '15 13:09

Will Munn

1 Answers

Should we really be doing this for all incoming URLs?

No, you shouldn't. The request being made uses non-UTF8 URI components. That shouldn't be your problem.

Are there any security or performance implications we should be concerned about?

The encoding of a URI component is not a security issue. Injection attempts via querystring or path params are. But that's another subject. In terms of performance, every middleware will make your responses take a bit longer. But I wouldn't even worry about that. If you want to decode the URI yourself, just do it. It'll only take a few milliseconds.

Should we be concerned about unescape being removed in the near future?

Actually you should. unescape is deprecated. If you still want to use it; just check if it exists first. i.e. 'unescape' in global. You can also use the built-in alternate: require('querystring').unescape() which won't produce the same result in every case but it won't throw a URIError. (Not recommended though).

To minimise any adverse effect on search rankings:

Determine which status code your express app returns in these cases. It could be 500 (INTERNAL SERVER ERROR) which will look bad and 404 (NOT FOUND) which will tell the crawler you don't have a result for the query (which may not be true).

In these cases, I suggest you override this by returning a client error such as 400 (BAD REQUEST) instead, since the origin of the problem is a malformed URI component being requested, which should be in UTF-8 but it's not. The crawler/bot should be concerned about that.

// middleware for responding with BAD REQUEST
app.use(function (err, req, res, next) {
    if (err instanceof URIError) {
        res.status(400).send();
    }
});

Above all, trying to return a result for a malformed URI has other side effects. First, you'll be allowing a bad request — can't be good :). Secondly, it'll mean you have a result for a bad URI which will get stored by crawlers/bots when they get a 200 OK response and it will get spread. Then you'll have to deal with more bad requests.

To conclude; don't decode via unescape. Express already tries to decode via what's proper: decodeURIComponent. If that fails, let it be.

162

answered Sep 30 '22 06:09

Onur Yıldırım

Related questions
                            
                                How to set the Access-Control-Allow-Origin header with XMLHttpRequest
                            
                                How to get OpenGL version using Javascript?
                            
                                Turn HTML Form Input into JavaScript Variable
                            
                                .npmignore not ignoring files
                            
                                How do I use reactjs with Google DFP/AdSense
                            
                                Google Chrome Package Apps : How to make transparent rounded background like google hangout app?
                            
                                Angular Jasmine UI router inject resolve value into test
                            
                                If i send multiple messages to the same webworker, does it queue them up and process them sequentially?
                            
                                Change strokeDashoffset of a SVG line in a for loop
                            
                                SVG circle starting point
                            
                                TinyMCE color picker dropdown appears off-screen
                            
                                Convert CSS cubic bezier easing to Javascript
                            
                                THREE.Object3D.add: object not an instance of THREE.Object3D
                            
                                Understanding how JavaScript Prototypes work
                            
                                ES6 shorthand object key checking
                            
                                JS Inheritance: calling the parent's function from within the child's function
                            
                                Using browserify, Uncaught ReferenceError: function is not defined
                            
                                aurelia-fetch-client.d.ts undefined symbols
                            
                                Does jQuery UI dialog box text have a newline option?
                            
                                How can i use ES6 syntax such as let in chrome console? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to deal with non UTF-8 encoded urls in express

Tags:

javascript

node.js

iis

url-encoding

bing

Will Munn

People also ask

1 Answers

Onur Yıldırım

Recent Activity

Donate For Us