Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enumerate regular expressions via UglifyJS

I have some JavaScript code, from which I need to find start+end indexes of every literal regular expression.

How can such information be extracted from UglifyJS?

var uglify = require('uglify-js');
var code = "func(1/2, /hello/);";
var parsed = uglify.parse(code);

The structure I'm getting into variable parsed is very complex. And all I need is an array of [{startIdx, endIdx}, {startIdx, endIdx}] of every literal regular expression.

P.S. If somebody thinks that the same task can be accomplished in a way that's better than via UglifyJS, you are welcome to suggest!

UPDATE

I know if I dig deeper into the parsed structure, then for every regular expression I can find object:

AST_Token {
     raw: '/hello/',
     file: null,
     comments_before: [],
     nlb: false,
     endpos: 17,
     endcol: 17,
     endline: 1,
     pos: 10,
     col: 10,
     line: 1,
     value: /hello/,
     type: 'regexp'
}

I need to figure out how to pull all such objects from the parsed structure, so I can compile the list of position indexes.

like image 489
vitaly-t Avatar asked Dec 30 '15 07:12

vitaly-t


1 Answers

I got this ultimately useful link to the UglifyJS author's blog post, which pointed me in the right direction. Based on that blog I was able to modify my enumeration code to the following:

function enumRegEx(parsed) {
    var result = [];
    parsed.walk(new uglify.TreeWalker(function (obj) {
        if (obj instanceof uglify.AST_RegExp) {
            result.push({
                startIdx: obj.end.col,
                endIdx: obj.end.endcol
            });
        }
    }));
    return result;
}

Not only this thing is shorter and works the same, but its processing speed is almost instant, within 10ms, which puts the previous result (430ms) to shame.

Now that is the result I was looking for! :)

UPDATE: In the end though, I found out that for this particular task esprima is a much better choice. It is much faster and has full ES6 support, unlike UglifyJS.

The very same task done via esprima, thanks to the excellent support from Ariya Hidayat:

function parseRegEx(originalCode) {
    var result = [];
    esprima.tokenize(originalCode, {loc: true, range: true}, function (obj) {
        if (obj.type === 'RegularExpression') {
            result.push({
                startIdx: obj.range[0],
                endIdx: obj.range[1]
            });
        }
    });
    return result;
}

As you can see, with esprima you do not even need to parse the code, you pass in the original code instead, which esprima will only tokenize, which is way faster.

like image 159
vitaly-t Avatar answered Sep 28 '22 07:09

vitaly-t