Python scraping of javascript web pages fails for https pages only

Question

I'm using PyQt5 to scrape web pages, which works great for http:// URLs, but not at all for https:// URLs.

The relevant part of my script is below:

class WebPage(QWebPage):
    def __init__(self):
        super(WebPage, self).__init__()

        self.timerScreen = QTimer()
        self.timerScreen.setInterval(2000)
        self.timerScreen.setSingleShot(True)
        self.timerScreen.timeout.connect(self.handleLoadFinished)

        self.loadFinished.connect(self.timerScreen.start)


    def start(self, urls):
        self._urls = iter(urls)
        self.fetchNext()

    def fetchNext(self):
        try:
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.mainFrame().load(QUrl(url))
        return True

    def processCurrentPage(self):
        url = self.mainFrame().url().toString()
        html = self.mainFrame().toHtml()

        #Do stuff with html
        print('loaded: [%d bytes] %s' % (self.bytesReceived(), url))

    def handleLoadFinished(self):
        self.processCurrentPage()
        if not self.fetchNext():
            qApp.quit()

For secure pages, the script returns a blank page. The only html coming back is <html><head></head><body></body></html>.

I'm at a bit of a loss. Is there a setting that I'm missing related to handling secure URLs?

Abhishek Menon · Accepted Answer

If you're on windows, please try this: Build PyQt5 on Windows with OpenSSL support?

Have you considered using Beautiful Soup or Scrapy.

I have used Beautiful Soup for my project and it worked like a charm. It has SSL support too.

Python scraping of javascript web pages fails for https pages only

Tags:

python

https

ssl

pyqt

pyqt5

seymourgoestohollywood

1 Answers

Abhishek Menon

Recent Activity

Donate For Us

Python scraping of javascript web pages fails for https pages only

Tags:

python

https

ssl

pyqt

pyqt5

seymourgoestohollywood

1 Answers

Abhishek Menon

Related questions

Recent Activity

Donate For Us