Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read XML file from URL in python?

I want to access the information present in the sub node. Is this because of the structure of the file?

Tried extracting the author subnode information in a file separately and run python code. That works fine

import urllib
import xml.etree.ElementTree as ET

url = 'https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/fe9e8b7d-61ea-409d-84aa-3ebd79a046b5.xml'

print 'Retrieving', url

document = urllib.urlopen (url).read()
print 'Retrieved', len(document), 'characters.'

print document[:50]

tree = ET.fromstring(document)

lst = tree.findall('title')
print lst[:100]
like image 908
PANKAJ KUMAR Avatar asked Feb 19 '19 10:02

PANKAJ KUMAR


People also ask

How do I view XML files in Python?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().

Can pandas read XML?

The Pandas data analysis library provides functions to read/write data for most of the file types. For example, it includes read_csv() and to_csv() for interacting with CSV files. However, Pandas does not include any methods to read and write XML files.

How do I read XML files?

XML files can be opened in a browser like IE or Chrome, with any text editor like Notepad or MS-Word. Even Excel can be used to open XML files.


2 Answers

You couldn't find title elements because of the namespace.

Below a sample code to find:

  • Title from "document" tag
  • Title from inner "component" tag
    import xml.etree.ElementTree as ET
    import urllib.request

    url = 'https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/fe9e8b7d-61ea-409d-84aa-3ebd79a046b5.xml'
    response = urllib.request.urlopen(url).read()
    tree = ET.fromstring(response)


    for docTitle in tree.findall('{urn:hl7-org:v3}title'):
        print(docTitle.text)

    for compTitle in tree.findall('.//{urn:hl7-org:v3}title'):
        print(compTitle.text)

UPDATE

If you need to search XML nodes you should use xPath Expressions

Example:

NS = '{urn:hl7-org:v3}'
ID = '829076996'    # ID TO BE FOUND

# XPATH TO FIND AUTHORS BY ID (search ID and return related author node)
xPathAuthorById = ''.join([
    ".//",
    NS, "author/",
    NS, "assignedEntity/",
    NS, "representedOrganization/",
    NS, "id[@extension='", ID,
    "']/../../.."
    ])

# XPATH TO FIND AUTHOR NAME ELEMENT
xPathAuthorName = ''.join([
    "./",
    NS, "assignedEntity/",
    NS, "representedOrganization/",
    NS, "name"
    ])

# FOR EACH AUTHOR FOUND, SEARCH ATTRIBUTES (example name)
for author in tree.findall(xPathAuthorById):
    name = author.find(xPathAuthorName)
    print(name.text)

This example prints the author name for the ID 829076996

UPDATE 2

You can easily process all assignedEntity tags with a findall method. For each of them you can have multiple products, so another findall method is needed (see example below).

xPathAssignedEntities = ''.join([
    ".//",
    NS, "author/",
    NS, "assignedEntity/",
    NS, "representedOrganization/",
    NS, "assignedEntity/", 
    NS, "assignedOrganization/", 
    NS, "assignedEntity"
    ])

xPathProdCode = ''.join([
    NS, "actDefinition/",
    NS, "product/",
    NS, "manufacturedProduct/",
    NS, "manufacturedMaterialKind/",
    NS, "code"
    ])


# GET ALL assignedEntity TAGS
for assignedEntity in tree.findall(xPathAssignedEntities):

    # GET ID AND NAME OF assignedEntity
    id = assignedEntity.find(NS + 'assignedOrganization/'+ NS + 'id').get('extension')
    name = assignedEntity.find(NS + 'assignedOrganization/' + NS + 'name').text

    # FOR EACH assignedEntity WE CAN HAVE MULTIPLE <performance> TAGS
    for performance in assignedEntity.findall(NS + 'performance'):
        actCode = performance.find(NS + 'actDefinition/'+ NS + 'code').get('displayName')
        prodCode = performance.find(xPathProdCode).get('code')
        print(id, '\t', name, '\t', actCode, '\t', prodCode)

This is the result:

829084545    Pfizer Pharmaceuticals LLC      ANALYSIS    0049-0050 
829084545    Pfizer Pharmaceuticals LLC      ANALYSIS    0049-4900 
829084545    Pfizer Pharmaceuticals LLC      ANALYSIS    0049-4910 
829084545    Pfizer Pharmaceuticals LLC      ANALYSIS    0049-4940 
829084545    Pfizer Pharmaceuticals LLC      ANALYSIS    0049-4960 
829084545    Pfizer Pharmaceuticals LLC      API MANUFACTURE     0049-0050
829084545    Pfizer Pharmaceuticals LLC      API MANUFACTURE     0049-4900
829084545    Pfizer Pharmaceuticals LLC      API MANUFACTURE     0049-4910
829084545    Pfizer Pharmaceuticals LLC      API MANUFACTURE     0049-4940
829084545    Pfizer Pharmaceuticals LLC      API MANUFACTURE     0049-4960
829084545    Pfizer Pharmaceuticals LLC      MANUFACTURE     0049-4900 
829084545    Pfizer Pharmaceuticals LLC      MANUFACTURE     0049-4910 
829084545    Pfizer Pharmaceuticals LLC      MANUFACTURE     0049-4960 
829084545    Pfizer Pharmaceuticals LLC      PACK    0049-4900 
829084545    Pfizer Pharmaceuticals LLC      PACK    0049-4910 
829084545    Pfizer Pharmaceuticals LLC      PACK    0049-4960 
618054084    Pharmacia and Upjohn Company LLC    ANALYSIS    0049-0050
618054084    Pharmacia and Upjohn Company LLC    ANALYSIS    0049-4940
829084552    Pfizer Pharmaceuticals LLC      PACK    0049-4900 
829084552    Pfizer Pharmaceuticals LLC      PACK    0049-4910 
829084552    Pfizer Pharmaceuticals LLC      PACK    0049-4960
like image 200
manuel_b Avatar answered Sep 21 '22 00:09

manuel_b


You could use xmltodict in order to generate a python dictionary from the requested XML data..

Here's a basic example:

import urllib2
import xmltodict

def foobar(request):
    file = urllib2.urlopen('https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/fe9e8b7d-61ea-409d-84aa-3ebd79a046b5.xml')
    data = file.read()
    file.close()

    data = xmltodict.parse(data)
    return {'xmldata': data}
like image 29
iLuvLogix Avatar answered Sep 22 '22 00:09

iLuvLogix