I am trying to log into a website. When I look at print(g.text) I am not getting back the web page I expect but instead a cloudflare page that says 'Checking your browser before accessing'
import requests
import time
s = requests.Session()
s.get('https://www.off---white.com/en/GB/')
headers = {'Referer': 'https://www.off---white.com/en/GB/login'}
payload = {
'utf8':'✓',
'authenticity_token':'',
'spree_user[email]': '[email protected]',
'spree_user[password]': 'PASSWORD',
'spree_user[remember_me]': '0',
'commit': 'Login'
}
r = s.post('https://www.off---white.com/en/GB/login', data=payload, headers=headers)
print(r.status_code)
g = s.get('https://www.off---white.com/en/GB/account')
print(g.status_code)
print(g.text)
Why is this occurring when I have set the session?
Cloudflare uses two cookies as tokens: one to verify you made it past their challenge page and one to track your session. To bypass the challenge page, simply include both of these cookies (with the appropriate user-agent) in all HTTP requests you make. To retrieve just the cookies (as a dictionary), use cloudscraper.
Like urllib2 , requests is blocking. But I wouldn't suggest using another library, either. The simplest answer is to run each request in a separate thread.
Python's Requests Default 'User-Agent' utils.
You might want to try this:
import cloudscraper
scraper = cloudscraper.create_scraper() # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper() # CloudScraper inherits from requests.Session
print scraper.get("http://somesite.com").text # => "<!DOCTYPE html><html><head>..."
It does not require Node.js dependency. All credits go to this pypi page
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With