I wanted to write a piece of code like the following:
from bs4 import BeautifulSoup
import urllib2
url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)
But I found that I have to install urllib3
package now.
Moreover, I couldn't find any tutorial or example to understand how to rewrite the above code, for example, urllib3
does not have urlopen
.
Any explanation or example, please?!
P/S: I'm using python 3.4.
True, if you want to avoid adding any dependencies, urllib is available. But note that even the Python official documentation recommends the requests library: "The Requests package is recommended for a higher-level HTTP client interface."
Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols. Urllib is a package that collects several modules for working with URLs, such as: urllib.
urllib3 is a different library from urllib and urllib2. It has lots of additional features to the urllibs in the standard library, if you need them, things like re-using connections. The documentation is here: https://urllib3.readthedocs.org/
If you'd like to use urllib3, you'll need to pip install urllib3
. A basic example looks like this:
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
url = 'http://www.thefamouspeople.com/singers.php'
response = http.request('GET', url)
soup = BeautifulSoup(response.data)
You do not have to install urllib3
. You can choose any HTTP-request-making library that fits your needs and feed the response to BeautifulSoup
. The choice is though usually requests
because of the rich feature set and convenient API. You can install requests
by entering pip install requests
in the command line. Here is a basic example:
from bs4 import BeautifulSoup
import requests
url = "url"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
The new urllib3 library has a nice documentation here
In order to get your desired result you shuld follow that:
Import urllib3
from bs4 import BeautifulSoup
url = 'http://www.thefamouspeople.com/singers.php'
http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))
The "decode utf-8" part is optional. It worked without it when i tried, but i posted the option anyway.
Source: User Guide
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With