Using Python and BeautifulSoup (saved webpage source codes into a local file)

Tags:

beautifulsoup

I am using Python 2.7 + BeautifulSoup 4.3.2.

I am trying to use Python and BeautifulSoup to pick up information on a webpage. Because the webpage is in the company website and requires login and redirection, I copied the target page's source code page into a file and saved it as “example.html” in C:\ for the convenience of practicing.

This is a part of the original code:

<tr class="ghj">     <td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&amp;u=12563">port_new_cape</a></td>     <td class="position"><a href="./search.php?id=12563&amp;sr=positions" title="Search positions">452</a></td>     <td class="details"><div>South</div></td>     <td>May 09, 1997</td>     <td>Jan 23, 2009 12:05 pm&nbsp;</td> </tr>

The code I worked out so far is:

from bs4 import BeautifulSoup import re import urllib2  url = "C:\example.html" page = urllib2.urlopen(url) soup = BeautifulSoup(page.read())  cities = soup.find_all('span', {'class' : 'city-sh'})  for city in cities: print city

This is just the first stage of testing, so it's somewhat incomplete.

However, when I run it, it gives an error message. Seems it’s improper to use urllib2.urlopen to open a local file.

 Traceback (most recent call last):    File "C:\Python27\Testing.py", line 8, in <module>      page = urllib2.urlopen(url)    File "C:\Python27\lib\urllib2.py", line 127, in urlopen      return _opener.open(url, data, timeout)    File "C:\Python27\lib\urllib2.py", line 404, in open      response = self._open(req, data)    File "C:\Python27\lib\urllib2.py", line 427, in _open      'unknown_open', req)    File "C:\Python27\lib\urllib2.py", line 382, in _call_chain      result = func(*args)    File "C:\Python27\lib\urllib2.py", line 1247, in unknown_open      raise URLError('unknown url type: %s' % type)  URLError: <urlopen error unknown url type: c>

How can I practice using a local file?

323

asked Feb 05 '14 07:02

Mark K

1 Answers

The best way to open a local file with BeautifulSoup is to pass it a file handler directly. http://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup

from bs4 import BeautifulSoup  with open("C:\\example.html") as fp:     soup = BeautifulSoup(fp, 'html.parser')  for city in soup.find_all('span', {'class' : 'city-sh'}):     print(city)

136

answered Oct 21 '22 01:10

CasualDemon

Related questions
                            
                                Python Requests requests.exceptions.SSLError: [Errno 8] _ssl.c:504: EOF occurred in violation of protocol
                            
                                How do you retrieve items from a dictionary in the order that they're inserted?
                            
                                watchdog monitoring file for changes
                            
                                How to downgrade the installed version of 'pip' on windows?
                            
                                How to create a conditional task in Airflow
                            
                                OpenCV - Apply mask to a color image
                            
                                Using Python's list index() method on a list of tuples or objects?
                            
                                Multiple assignment and evaluation order in Python
                            
                                Detect whether a Python string is a number or a letter [duplicate]
                            
                                How to switch to new window in Selenium for Python?
                            
                                How to install a Python module via its setup.py in Windows? [closed]
                            
                                Correlation heatmap
                            
                                How to determine file, function and line number?
                            
                                Nested list comprehension with two lists
                            
                                export notebook to pdf without code [duplicate]
                            
                                Reading two text files line by line simultaneously
                            
                                How to convert column with string type to int form in pyspark data frame?
                            
                                Identify the changed fields in django post_save signal
                            
                                ValueError: Unknown label type: 'unknown'
                            
                                asyncio.run() cannot be called from a running event loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With