Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to to extract a javascript function from a javascript file

I need to extract an entire javascript function from a script file. I know the name of the function, but I don't know what the contents of the function may be. This function may be embedded within any number of closures.

I need to have two output values:

  1. The entire body of the named function that I'm finding in the input script.
  2. The full input script with the found named function removed.

So, assume I'm looking for the findMe function in this input script:

function() {
  function something(x,y) {
    if (x == true) {
      console.log ("Something says X is true");
      // The regex should not find this:
      console.log ("function findMe(z) { var a; }");
    }
  }
  function findMe(z) {
    if (z == true) {
      console.log ("Something says Z is true");
    }
  }
  findMe(true);
  something(false,"hello");
}();

From this, I need the following two result values:

  1. The extracted findMe script

    function findMe(z) {
      if (z == true) {
        console.log ("Something says Z is true");
      }
    }
    
  2. The input script with the findMe function removed

    function() {
      function something(x,y) {
        if (x == true) {
          console.log ("Something says X is true");
          // The regex should not find this:
          console.log ("function findMe(z) { var a; }");
        }
      }
      findMe(true);
      something(false,"hello");
    }();
    

The problems I'm dealing with:

  1. The body of the script to find could have any valid javascript code within it. The code or regex to find this script must be able to ignore values in strings, multiple nested block levels, and so forth.

  2. If the function definition to find is specified inside of a string, it should be ignored.

Any advice on how to accomplish something like this?

Update:

It looks like regex is not the right way to do this. I'm open to pointers to parsers that could help me accomplish this. I'm looking at Jison, but would love to hear about anything else.

like image 447
Tauren Avatar asked Jul 05 '11 21:07

Tauren


1 Answers

If the script is included in your page (something you weren't clear about) and the function is publicly accessible, then you can just get the source to the function with:

functionXX.toString();

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Function/toString

Other ideas:

1) Look at the open source code that does either JS minification or JS pretty indent. In both cases, those pieces of code have to "understand" the JS language in order to do their work in a fault tolerant way. I doubt it's going to be pure regex as the language is just a bit more complicated than that.

2) If you control the source at the server and are wanted to modify a particular function in it, then just insert some new JS that replaces that function at runtime with your own function. That way, you let the JS compiler identify the function for you and you just replace it with your own version.

3) For regex, here's what I've done which is not foolproof, but worked for me for some build tools I use:

I run multiple passes (using regex in python):

  1. Remove all comments delineated with /* and */.
  2. Remove all quoted strings
  3. Now, all that's left is non-string, non-comment javascript so you should be able to regex directly on your function declaration
  4. If you need the function source with strings and comments back in, you'll have to reconstitute that from the original, now that you know the begin end of the function

Here are the regexes I use (expressed in python's multi-line format):

reStr = r"""
    (                               # capture the non-comment portion
        "(?:\\.|[^"\\])*"           # capture double quoted strings
        |
        '(?:\\.|[^'\\])*'           # capture single quoted strings
        |
        (?:[^/\n"']|/[^/*\n"'])+    # any code besides newlines or string literals
        |
        \n                          # newline
    )
    |
    (/\*  (?:[^*]|\*[^/])*   \*/)       # /* comment */
    |
    (?://(.*)$)                     # // single line comment
    $"""    

reMultiStart = r"""         # start of a multiline comment that doesn't terminate on this line
    (
        /\*                 # /* 
        (
            [^\*]           # any character that is not a *
            |               # or
            \*[^/]          # * followed by something that is not a /
        )*                  # any number of these
    )
    $"""

reMultiEnd = r"""           # end of a multiline comment that didn't start on this line
    (
        ^                   # start of the line
        (
            [^\*]           # any character that is not a *
            |               # or
            \*+[^/]         # * followed by something that is not a /
        )*                  # any number of these
        \*/                 # followed by a */
    )
"""

regExSingleKeep = re.compile("// /")                    # lines that have single lines comments that start with "// /" are single line comments we should keep
regExMain = re.compile(reStr, re.VERBOSE)
regExMultiStart = re.compile(reMultiStart, re.VERBOSE)
regExMultiEnd = re.compile(reMultiEnd, re.VERBOSE)

This all sounds messy to me. You might be better off explaining what problem you're really trying to solve so folks can help find a more elegant solution to the real problem.

like image 169
jfriend00 Avatar answered Oct 02 '22 03:10

jfriend00