Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the protocol (http or https) of the website using Python

I'm just thinking about how we can imitate how browser, say Chrome, detects the protocol of the website with Python. For example we type "stackoverflow.com" on the address bar, then press Enter, browser can automatically detects and change the url to "https://stackoverflow.com" (add website's protocol), I wonder how we can do it in Python, exactly like:

url = "stackoverflow.com"
browser = Browser (url) # Browser is a class that we can get website content from url, get its protocol,...
print browser.protocol

https

Is there any library or package that help do this? Thanks a lot.

Edit: My question is unique since other question ask how to redirect to https if we enter http, as I mention, can we automatically detect at the first stage without dummy protocol?

like image 295
Blurie Avatar asked Nov 29 '22 09:11

Blurie


1 Answers

It works for stackoverflow because when you first visit stackoverflow.com on port 80 (the http port), stackoverflow's servers notify the browser that the link has been permanently moved to https.

To detect the same in Python, use the requests library, like this:

>>> import requests
>>> r = requests.get('http://stackoverflow.com') # first we try http
>>> r.url # check the actual URL for the site
'https://stackoverflow.com/'

To find out how the URL changed, look at the history object, and you will see a 301 response, which means the URI has moved permanently to a new address.

>>> r.history[0]
<Response [301]>
>>> r.history[0].url # this is the original URL we tried
'http://stackoverflow.com/'
like image 60
Burhan Khalid Avatar answered Dec 01 '22 22:12

Burhan Khalid