python beautifulsoup iframe document html extract

Tags:

I am trying to learn a bit of beautiful soup, and to get some html data out of some iFrames - but I have not been very successful so far.

So, parsing the iFrame in itself does not seem to be a problem with BS4, but I do not seem to get the embedded content from this - whatever I do.

For example, consider the below iFrame (this is what I see on chrome developer tools):

Click to copy

<iframe frameborder="0" marginwidth="0" marginheight="0" scrolling="NO"
src="http://www.engineeringmaterials.com/boron/728x90.html "width="728" height="90">
#document <html>....</html></iframe>

where, <html>...</html> is the content I am interested in extracting.

However, when I use the following BS4 code:

Click to copy

iFrames=[] # qucik bs4 example
for iframe in soup("iframe"):
    iFrames.append(soup.iframe.extract())

I get:

Click to copy

<iframe frameborder="0" marginwidth="0" marginheight="0" scrolling="NO" src="http://www.engineeringmaterials.com/boron/728x90.html" width="728" height="90">

In other words, I get the iFrames without the document <html>...</html> within them.

I tried something along the lines of:

Click to copy

iFrames=[] # qucik bs4 example
iframexx = soup.find_all('iframe')
for iframe in iframexx:
    print iframe.find_all('html')

.. but this does not seem to work..

So, I guess my question is, how do I reliably extract these document objects <html>...</html> from the iFrame elements.

621

asked Apr 12 '14 09:04

AJW

1 Answers

Browsers load the iframe content in a separate request. You'll have to do the same:

Click to copy

for iframe in iframexx:
    response = urllib2.urlopen(iframe.attrs['src'])
    iframe_soup = BeautifulSoup(response)

Remember: BeautifulSoup is not a browser; it won't fetch images, CSS and JavaScript resources for you either.

185

answered Sep 20 '22 13:09

Martijn Pieters

Related questions
                            
                                CMake build of LLVM clang fails with "Unexpected failure executing llvm-build: Traceback (...) import llvmbuild"
                            
                                MPRIS + Python (dbus): reading and writing properties
                            
                                Access value, column index, and row_ptr data from scipy CSR sparse matrix
                            
                                Accessing items in lists within dictionary python
                            
                                Error message for virtualenvwrapper on OS X Lion
                            
                                Hook to add commands to distutils build?
                            
                                Get RGB value opencv python
                            
                                How can I get Python Argparse to list choices only once?
                            
                                Avoid references in PyYAML
                            
                                calling rsync from python subprocess.call
                            
                                Getting Table and Column names in PyOdbc
                            
                                How do you raise a python exception and include additional data for Sentry?
                            
                                Python Flask-WTF - use same form template for add and edit operations
                            
                                When is it good to use nested functions in Python?
                            
                                How to iterate over a Python dictionary in defined order?
                            
                                Python library to generate regular expressions
                            
                                Django test client does not log in
                            
                                Creating dictionary from numpy array
                            
                                normality test of a distribution in python
                            
                                Vim compiled with Python support but can't see sys version

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python beautifulsoup iframe document html extract

Tags:

python

html

beautifulsoup

iframe

AJW

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us