I'm trying to find best way to capture links listed under response headers, exactly like this one and I'm using python requests module. Below is link which has Link Headers section on Python Requests page: docs.python-requests.org/en/latest/user/advanced/
But, in my case my response headers contains links like below:
{'content-length': '12276', 'via': '1.1 varnish-v4', 'links': '<http://justblahblahblah.com/link8.html>;rel="last">,<http://justblahblahblah.com/link2.html>;rel="next">', 'vary': 'Accept-Encoding, Origin'}
Please notice > after "last" which is not the case under Requests examples and I just cant seem to figure out how to solve this.
To add HTTP headers to a request, you can simply pass them in a dict to the headers parameter. Similarly, you can also send your own cookies to a server using a dict passed to the cookies parameter.
HTTP headers let the client and the server pass additional information with an HTTP request or response. All the headers are case-insensitive, headers fields are separated by colon, key-value pairs in clear-text string format.
To send parameters in URL, write all parameter key:value pairs to a dictionary and send them as params argument to any of the GET, POST, PUT, HEAD, DELETE or OPTIONS request. then https://somewebsite.com/?param1=value1¶m2=value2 would be our final url.
There is already a way provided by requests
to access links header
response.links
It returns the dictionary of links header value which can easily parsed further using
response.links['next']['url']
to get the required values.
You can parse the header's value manually. To make things easier you might want to use request's parsing function parse_header_links
as a reference.
Or you can do some find/replace and use original parse_header_links
In [1]: import requests In [2]: d = {'content-length': '12276', 'via': '1.1 varnish-v4', 'links': '<http://justblahblahblah.com/link8.html>;rel="last">,<http://justblahblahblah.com/link2.html>;rel="next">', 'vary': 'Accept-Encoding, Origin'} In [3]: requests.utils.parse_header_links(d['links'].rstrip('>').replace('>,<', ',<')) Out[3]: [{'rel': 'last', 'url': 'http://justblahblahblah.com/link8.html'}, {'rel': 'next', 'url': 'http://justblahblahblah.com/link2.html'}]
If there might be a space or two between >,
and <
then you need to do replace with a regular expression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With