I was scraping this aspx website https://gra206.aca.ntu.edu.tw/Temp/W2.aspx?Type=2 .
As it required, I have to parse in __VIEWSTATE and __EVENTVALIDATION while sending a post request. Now I am trying to send a get request first to have those two values, and then parse then afterward.
However, I have tried several times to send a get request. It always turns out throwing this error message:
requests.exceptions.SSLError: HTTPSConnectionPool(host='gra206.aca.ntu.edu.tw', port=443): Max retries exceeded with url: /Temp/W2.aspx?Type=2 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
I have tried:
However, none of them works.
I am currently using:
env:
python 2.7
bs4 4.6.0
request 2.18.4
openssl 1.0.2n
Here is my code:
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
url = 'https://gra206.aca.ntu.edu.tw/Temp/W2.aspx?Type=2'
r = s.get(url, headers={'x-test2': 'true'})
soup = BeautifulSoup(r.content, 'lxml')
viewstate = soup.find('input', {'id': '__VIEWSTATE' })['value']
validation = soup.find('input', {'id': '__EVENTVALIDATION' })['value']
print viewstate, generator, validation
I am also looking for a solution for it. Some sites have deprecated TLSv1.0 and Requests + Openssl (on Windows 7) has trouble to build handshake with such peer host. Wireshark log showed the TLSv1 Client Hello was issued by the client but the host did not answer correctly. This error propagated up as the error message Requests showed. Even with the most updated Openssl/pyOpenssl/Requests and tried on Py3.6/2.7.12, no luck. Intrestingly when I replace the url to other like "google.com", the log showed TLSv1.2 Hello was issued and responded by the host. Please check images tlsv1 and tlsv1.2. Clearly the client has TLSv1.2 capability but why it use v1.0 Hello in the former case?
[EDIT] I was wrong in previous statement. Wireshark misinterpreted unfinished TLSv1.2 HELLO exchanged as TLSv1. After more digging into it, I found these hosts is expecting pure TLSv1, but not a TLSv1 fallback from TLSv1.2. Due to Openssl's lack of some fields in the Hello extension fields (maybe Supported Version) when compared with the log from Chrome. I found a workaround to it. 1. Force the use of TLSv1 negotiation. 2. Change the default cipher suite to py3.4 style to re-enable 3DES.
import ssl
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
#from urllib3.poolmanager import PoolManager
from requests.packages.urllib3.util.ssl_ import create_urllib3_context
# py3.4 default
CIPHERS = (
'ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+HIGH:'
'DH+HIGH:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+HIGH:RSA+3DES:!aNULL:'
'!eNULL:!MD5'
)
class DESAdapter(HTTPAdapter):
"""
A TransportAdapter that re-enables 3DES support in Requests.
"""
def create_ssl_context(self):
#ctx = create_urllib3_context(ciphers=FORCED_CIPHERS)
ctx = ssl.create_default_context()
# allow TLS 1.0 and TLS 1.2 and later (disable SSLv3 and SSLv2)
#ctx.options |= ssl.OP_NO_SSLv2
#ctx.options |= ssl.OP_NO_SSLv3
#ctx.options |= ssl.OP_NO_TLSv1
ctx.options |= ssl.OP_NO_TLSv1_2
ctx.options |= ssl.OP_NO_TLSv1_1
#ctx.options |= ssl.OP_NO_TLSv1_3
ctx.set_ciphers( CIPHERS )
#ctx.set_alpn_protocols(['http/1.1', 'spdy/2'])
return ctx
def init_poolmanager(self, *args, **kwargs):
context = create_urllib3_context(ciphers=CIPHERS)
kwargs['ssl_context'] = self.create_ssl_context()
return super(DESAdapter, self).init_poolmanager(*args, **kwargs)
def proxy_manager_for(self, *args, **kwargs):
context = create_urllib3_context(ciphers=CIPHERS)
kwargs['ssl_context'] = self.create_ssl_context()
return super(DESAdapter, self).proxy_manager_for(*args, **kwargs)
tmoval=10
proxies={}
hdr = {'Accept-Language':'zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4', 'Cache-Control':'max-age=0', 'Connection':'keep-alive', 'Proxy-Connection':'keep-alive', #'Cache-Control':'no-cache', 'Connection':'close',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36',
'Accept-Encoding':'gzip,deflate,sdch','Accept':'*/*'}
ses = requests.session()
ses.mount(url, DESAdapter())
response = ses.get(url, timeout=tmoval, headers = hdr, proxies=proxies)
[EDIT2] When your HTTPS url contains any uppercase letter, the patch would fail to work. You need to reverse them to lowercase. Something unknown in the stack requests/urllib3/openssl cause the patch logic being restored to its default TLS1.2 fashion.
[EDIT3] from http://docs.python-requests.org/en/master/user/advanced/
The mount call registers a specific instance of a Transport Adapter to a prefix. Once mounted, any HTTP request made using that session whose URL starts with the given prefix will use the given Transport Adapter.
So, to make all HTTPS requests include those redirected by the server afterwards to use the new adapter, must change this line to:
ses.mount('https://', DESAdapter())
Somehow it fixed the uppercase problem mentioned above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With