Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I request (get) and read an xml file using python?

I tried requesting an RSS feed on Treasury Direct using Python. In the past I've used urllib, or requests libraries to serve this purpose and it's worked fine. This time however, I continue to get the 406 status error, which I understand is the page's way of telling me it doesn't accept my header details from the request. I've tried altering it however to no avail.
This is how I've tried

import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
user_agent = {'User-agent': 'Mozilla/5.0'}
response  = requests.get(url, headers = user_agent)
print response.text

Environments: Python 2.7 and 3.4. I also tried accessing via curl with the same exact error.

I believe this to be page specific, but can't figure out how to appropriately frame a request to read this page.

I found an API on the page which I can read the same data in json so this issue is now more of a curiosity to me than a true problem.

Any answers would be greatly appreciated!

Header Details

{'surrogate-control': 'content="ESI/1.0",no-store', 'content-language': 'en-US', 'x-content-type-options': 'nosniff', 'x-powered-by': 'Servlet/3.0', 'transfer-encoding': 'chunked', 'set-cookie': 'BIGipServerpl_www.treasurydirect.gov_443=3221581322.47873.0000; path=/; Httponly; Secure, TS01598982=016b0e6f4634928e3e7e689fa438848df043a46cb4aa96f235b0190439b1d07550484963354d8ef442c9a3eb647175602535b52f3823e209341b1cba0236e4845955f0cdcf; Path=/', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'keep-alive': 'timeout=10, max=100', 'connection': 'Keep-Alive', 'cache-control': 'no-store', 'date': 'Sun, 23 Apr 2017 04:13:00 GMT', 'x-frame-options': 'SAMEORIGIN', '$wsep': '', 'content-type': 'text/html;charset=ISO-8859-1'}
like image 868
Dom DaFonte Avatar asked Apr 23 '17 04:04

Dom DaFonte


People also ask

Can Python Open XML?

It is Python module, used to read XML file. It provides parse() function to read XML file. We must import Minidom first before using its function in the application. The syntax of this function is given below.

How do I request XML?

If you want to send XML data to the server, set the Request Header correctly to be read by the sever as XML. xmlhttp. setRequestHeader('Content-Type', 'text/xml'); Use the send() method to send the request, along with any XML data.


1 Answers

You need to add accept to headers request:

import requests

url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
headers = {'accept': 'application/xml;q=0.9, */*;q=0.8'}
response = requests.get(url, headers=headers)

print response.text
like image 172
vold Avatar answered Oct 11 '22 15:10

vold