How to retrieve the values of dynamic html content using Python

I'm using Python 3 and I'm trying to retrieve data from a website. However, this data is dynamically loaded and the code I have right now doesn't work:

url = eveCentralBaseURL + str(mineral)
print("URL : %s" % url);

response = request.urlopen(url)
data = str(response.read(10000))

data = data.replace("\\n", "\n")
print(data)

Where I'm trying to find a particular value, I'm finding a template instead e.g."{{formatPrice median}}" instead of "4.48".

How can I make it so that I can retrieve the value instead of the placeholder text?

Edit: This is the specific page I'm trying to extract information from. I'm trying to get the "median" value, which uses the template {{formatPrice median}}

Edit 2: I've installed and set up my program to use Selenium and BeautifulSoup.

The code I have now is:

from bs4 import BeautifulSoup
from selenium import webdriver

#...

driver = webdriver.Firefox()
driver.get(url)

html = driver.page_source
soup = BeautifulSoup(html)

print "Finding..."

for tag in soup.find_all('formatPrice median'):
    print tag.text

Here is a screenshot of the program as it's executing. Unfortunately, it doesn't seem to be finding anything with "formatPrice median" specified.

How do I print dynamic value in HTML?

There may be a better / easier way to do this, but you can create a directive that takes the name of the attribute and then sets it on the element. Then you can use it like: <div class="myStyle" [customAttr]="printThisValue"> and it will be emitted with <div class="myStyle" customAttr> .

Can we make dynamic website using Python?

Discover the concepts of creating dynamic web pages (HTML) with Python. This book reviews several methods available to serve up dynamic HTML including CGI, SSI, Django, and Flask. You will start by covering HTML pages and CSS in general and then move on to creating pages via CGI.

Assuming you are trying to get values from a page that is rendered using javascript templates (for instance something like handlebars), then this is what you will get with any of the standard solutions (i.e. beautifulsoup or requests).

This is because the browser uses javascript to alter what it received and create new DOM elements. urllib will do the requesting part like a browser but not the template rendering part. A good description of the issues can be found here. This article discusses three main solutions:

parse the ajax JSON directly
use an offline Javascript interpreter to process the request SpiderMonkey, crowbar
use a browser automation tool splinter

This answer provides a few more suggestions for option 3, such as selenium or watir. I've used selenium for automated web testing and its pretty handy.

EDIT

From your comments it looks like it is a handlebars driven site. I'd recommend selenium and beautiful soup. This answer gives a good code example which may be useful:

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://eve-central.com/home/quicklook.html?typeid=34')

html = driver.page_source
soup = BeautifulSoup(html)

# check out the docs for the kinds of things you can do with 'find_all'
# this (untested) snippet should find tags with a specific class ID
# see: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class
for tag in soup.find_all("a", class_="my_class"):
    print tag.text

Basically selenium gets the rendered HTML from your browser and then you can parse it using BeautifulSoup from the page_source property. Good luck :)

How to retrieve the values of dynamic html content using Python

Tags:

python

html

templates

urllib

Tagc

People also ask

1 Answers

will-hart

Recent Activity

Donate For Us

How to retrieve the values of dynamic html content using Python

Tags:

python

html

templates

urllib

Tagc

People also ask

1 Answers

will-hart

Related questions

Recent Activity

Donate For Us