HTML page vastly different when using a headless webkit implementation using PyQT

Question

I was under the impression that using a headless browser implementation of webkit using PyQT will automatically get me the html code for each URL even with heavy JS code in it. But I am only seeing it partially. I am comparing with the page I get when I save the page from the firefox window.

I am using the following code -

class JabbaWebkit(QWebPage):
    # 'html' is a class variable

    def __init__(self, url, wait, app, parent=None):
        super(JabbaWebkit, self).__init__(parent)
        JabbaWebkit.html = ''

        if wait:
            QTimer.singleShot(wait * SEC, app.quit)
        else:
            self.loadFinished.connect(app.quit)

        self.mainFrame().load(QUrl(url))

    def save(self):
        JabbaWebkit.html = self.mainFrame().toHtml()

    def userAgentForUrl(self, url):
        return USER_AGENT


    def get_page(url, wait=None):
        # here is the trick how to call it several times
        app = QApplication.instance() # checks if QApplication already exists

        if not app: # create QApplication if it doesnt exist
            app = QApplication(sys.argv)
        #
        form = JabbaWebkit(url, wait, app)
        app.aboutToQuit.connect(form.save)
        app.exec_()
        return JabbaWebkit.html

Can some one see anything obviously wrong with the code?

After running the code through a few URLs, here is one I found that shows the problems I am running into quite clearly - http://www.chilis.com/EN/Pages/menu.aspx

Thanks for any pointers.

user2647646 · Accepted Answer

The page have ajax code, when it finish load, it still need some time to update the page with ajax. But you code will quit when it finish load.

You should add some code like this to wait some time and process events in webkit:

for i in range(200): #wait 2 seconds
    app.processEvents()
    time.sleep(0.01)

HTML page vastly different when using a headless webkit implementation using PyQT

Tags:

python

pyqt

pyside

user220201

1 Answers

user2647646

Recent Activity

Donate For Us

HTML page vastly different when using a headless webkit implementation using PyQT

Tags:

python

pyqt

pyside

user220201

1 Answers

user2647646

Related questions

Recent Activity

Donate For Us