Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyV8, can I manipulate DOM structure?

Tags:

python

v8

Lets assume that we have PyV8:

import PyV8
ctxt = PyV8.JSContext()

and a python DOM structure, for example xml.dom

How can I feed a .js-file to PyV8 so that it could change DOM-structure that I have.
If I had it's content:

$("#id").remove();

I want dom item to be removed.

PyV8 has perfect hello-world example. But I'd like to see something usefull.

To be clear, what I want to do is:
"Javascript file" -->-- magic -->-- DOM, (already built with html file) and changed now with passed javascript file

like image 813
Sergey Avatar asked Jun 19 '12 14:06

Sergey


2 Answers

A good example for what you're trying to do can be found here:

https://github.com/buffer/thug

It's a python http client executing JS via PyV8 for security research purposes, but can be strapped down easily for simpler needs.

like image 102
Michael Avatar answered Sep 27 '22 18:09

Michael


Appologies for the formatting. I spaced as best I could, but my screen reader doesn't like SO's formatting controls.

I'm going to take a shot at answering your question, though it seems a tad vague. Please let me know if I need to rewrite this answer to fit a different situation. I assume you are trying to get an HTML file from the web, and run Javascript from inside this file, to act on said document. Unfortunately, none of the Python xml libraries have true DOM support, and W3C DOM compliance is nonexistent in every package I have found. What you can do is use the PyV8 w3c.py dom file as a starting example, and create your own full DOM. W3C Sample Dom You will need to rewrite this module, though, as it does not respect quotes or apostrophys. BeautifulSoup is also not the speediest parser. I would recommend using something like lxml.etree's target parser option. LXML Target Parser Search for "The feed parser interface". Then, you can load an HTML/Script document with LXML, parse it as below, and run each of the scripts you need on the created DOM.

Find a partial example below. (Please note that the HTML standards are massive, scattered, and _highly browser specific, so your milage may vary).

class domParser(object):
    def __init__(self):
    #initialize dom object here, and obtain the root for the destination file object.
        self.dom = newAwesomeCompliantDom()
        self.document = self.dom.document
        self.this = self.document

    def comment(self, commentText):
    #add commentText to self.document or the above dom object you created
        self.this.appendChild(self.document.DOMImplementation.createComment(commentText))

    def start(self, tag, attrs):
    #same here
        self.this = self.this.appendChild(self.document.DOMImplimentation.newElement(tag,attrs))

    def data(self, dataText):
    #append data to the last accessed element, as a new Text child
        self.this.appendChild(self.document.DOMImpl.createDataNode(dataText))

    def end(self):
    #closing element, so move up the tree
        self.this = self.this.parentNode

    def close(self):
        return self.document

#unchecked, please validate yourself
x = lxml.etree.parse(target=domParser)
x.feed(htmlFile)
newDom = x.close()
like image 39
Brandon McGinty Avatar answered Sep 27 '22 17:09

Brandon McGinty