There is a JavaScript parser at least in C and Java (Mozilla), in JavaScript (Mozilla again) and Ruby. Is there any currently out there for Python?
I don't need a JavaScript interpreter, per se, just a parser that's up to ECMA-262 standards.
A quick google search revealed no immediate answers, so I'm asking the SO community.
Nowadays, there is at least one better tool, called slimit
:
SlimIt is a JavaScript minifier written in Python. It compiles JavaScript into more compact code so that it downloads and runs faster.
SlimIt also provides a library that includes a JavaScript parser, lexer, pretty printer and a tree visitor.
Demo:
Imagine we have the following javascript code:
$.ajax({
type: "POST",
url: 'http://www.example.com',
data: {
email: '[email protected]',
phone: '9999999999',
name: 'XYZ'
}
});
And now we need to get email
, phone
and name
values from the data
object.
The idea here would be to instantiate a slimit
parser, visit all nodes, filter all assignments and put them into the dictionary:
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
$.ajax({
type: "POST",
url: 'http://www.example.com',
data: {
email: '[email protected]',
phone: '9999999999',
name: 'XYZ'
}
});
"""
parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
for node in nodevisitor.visit(tree)
if isinstance(node, ast.Assign)}
print fields
It prints:
{'name': "'XYZ'",
'url': "'http://www.example.com'",
'type': '"POST"',
'phone': "'9999999999'",
'data': '',
'email': "'[email protected]'"}
ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.
The ANTLR site provides many grammars, including one for JavaScript.
As it happens, there is a Python API available - so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).
I have translated esprima.js to Python:
https://github.com/PiotrDabkowski/pyjsparser
>>> from pyjsparser import parse
>>> parse('var $ = "Hello!"')
{
"type": "Program",
"body": [
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "$"
},
"init": {
"type": "Literal",
"value": "Hello!",
"raw": '"Hello!"'
}
}
],
"kind": "var"
}
]
}
It's a manual translation so its very fast, takes about 1 second to parse angular.js
file (so 100k characters per second). It supports whole ECMAScript 5.1 and parts of version 6 - for example Arrow functions, const
, let
.
If you need support for all the newest JS6 features you can translate esprima on the fly with Js2Py:
import js2py
esprima = js2py.require("[email protected]")
esprima.parse("a = () => {return 11};")
# {'body': [{'expression': {'left': {'name': 'a', 'type': 'Identifier'}, 'operator': '=', 'right': {'async': False, 'body': {'body': [{'argument': {'raw': '11', 'type': 'Literal', 'value': 11}, 'type': 'ReturnStatement'}], 'type': 'BlockStatement'}, 'expression': False, 'generator': False, 'id': None, 'params': [], 'type': 'ArrowFunctionExpression'}, 'type': 'AssignmentExpression'}, 'type': 'ExpressionStatement'}], 'sourceType': 'script', 'type': 'Program'}
As pib mentioned, pynarcissus is a Javascript tokenizer written in Python. It seems to have some rough edges but so far has been working well for what I want to accomplish.
Updated: Took another crack at pynarcissus and below is a working direction for using PyNarcissus in a visitor pattern like system. Unfortunately my current client bought the next iteration of my experiments and have decided not to make it public source. A cleaner version of the code below is on gist here
from pynarcissus import jsparser
from collections import defaultdict
class Visitor(object):
CHILD_ATTRS = ['thenPart', 'elsePart', 'expression', 'body', 'initializer']
def __init__(self, filepath):
self.filepath = filepath
#List of functions by line # and set of names
self.functions = defaultdict(set)
with open(filepath) as myFile:
self.source = myFile.read()
self.root = jsparser.parse(self.source, self.filepath)
self.visit(self.root)
def look4Childen(self, node):
for attr in self.CHILD_ATTRS:
child = getattr(node, attr, None)
if child:
self.visit(child)
def visit_NOOP(self, node):
pass
def visit_FUNCTION(self, node):
# Named functions
if node.type == "FUNCTION" and getattr(node, "name", None):
print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.end]
def visit_IDENTIFIER(self, node):
# Anonymous functions declared with var name = function() {};
try:
if node.type == "IDENTIFIER" and hasattr(node, "initializer") and node.initializer.type == "FUNCTION":
print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.initializer.end]
except Exception as e:
pass
def visit_PROPERTY_INIT(self, node):
# Anonymous functions declared as a property of an object
try:
if node.type == "PROPERTY_INIT" and node[1].type == "FUNCTION":
print str(node.lineno) + " | function " + node[0].value + " | " + self.source[node.start:node[1].end]
except Exception as e:
pass
def visit(self, root):
call = lambda n: getattr(self, "visit_%s" % n.type, self.visit_NOOP)(n)
call(root)
self.look4Childen(root)
for node in root:
self.visit(node)
filepath = r"C:\Users\dward\Dropbox\juggernaut2\juggernaut\parser\test\data\jasmine.js"
outerspace = Visitor(filepath)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With