Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Requests-HTML Render() - No Content

I'd like to scrape a page, the content of which seems to be rendered by an app referenced in the html like:

<div id="app" class="app-mobile-pusher"></div>

I'm using the render() method from Requests-HTML python library like so:

with HTMLSession() as session:
    p = session.post(login_url, data=payload)
    r = session.get(content_url)
    r.html.render()
    print(r.text)

This code returns the HTML for the page without any errors, but also without any content (just HTML tags). Notes:

  • I've tried adding time out arguments to session.get to give the page more time to render before accessing it and other variations on syntax of the above.

  • Also tried adding user agent information in headers based on this answer (in order to circumvent rejection of my automated scrape)

  • The chromium browser did download when I first ran render()

The lack of any error messages is stumping me and it is difficult to replicate the context of this request to test on another site.

Any specific suggestions for how to solve, or ideas for how to go about troubleshooting, appreciated. (Python 3.6, Mac OS)

like image 940
Dyneken Avatar asked Nov 13 '18 00:11

Dyneken


Video Answer


1 Answers

have you tried print(r.html.html) instead? The new rendered code is under this object path.

like image 182
StanKosy Avatar answered Oct 11 '22 20:10

StanKosy