Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read XML Into Pandas DataFrame

Tags:

python

pandas

xml

Just wondering if someone might be able to help figure out where I've gone wrong on this Python script. I'm trying to read the US light list weekly changes xml (found here: https://www.navcen.uscg.gov/sites/default/files/xml/lightLists/weeklyUpdates/v7d09WeeklyChanges.xml) into a Pandas Data frame. I don't program very often so I'm just starting off slowly, trying to write the 'District' field for each aid to an empty list. When I run the script it shows 'None' in the District column. How can I output the 'District' field to the dataframe? Thanks.

from lxml import etree
from lxml import objectify
import pandas as pd
import pandas_read_xml as pdx
import xml.etree.ElementTree as et
 
xml_file = (r'C:\Users\LAWRENCEA\Downloads\v7d09WeeklyChanges.xml')
 
parsed_xml = et.parse(xml_file)
xroot = parsed_xml.getroot()
 
df_cols = ["District"]
rows = []
 
for record in xroot.iter('dataroot'):
    for field in record.findall('Vol_x0020_07_x0020_D9_x0020_LL_x0020_corr_x0020_thru'):
        s_District = field.attrib.get('District')
        rows.append({"District": s_District})
    
 
df = pd.DataFrame(rows, columns = df_cols)
print(df)
like image 889
DroningVarlot Avatar asked Oct 20 '25 11:10

DroningVarlot


2 Answers

You are trying to select the District element, but are using .attrib() which selects attributes, not elements.

Instead, use .find():

field.find('District').text
like image 62
Mads Hansen Avatar answered Oct 22 '25 02:10

Mads Hansen


You can access the column of the dataframe with read_xml():

import pandas as pd

url = """https://www.navcen.uscg.gov/sites/default/files/xml/lightLists/weeklyUpdates/v7d09WeeklyChanges.xml"""

df = pd.read_xml(url, xpath=".//Vol_x0020_07_x0020_D9_x0020_LL_x0020_corr_x0020_thru")
print(df['District'])

Output:

0       9
1       9
2       9
3       9
4       9
       ..
4798    9
4799    9
4800    9
4801    9
4802    9
Name: District, Length: 4803, dtype: int64

Print the hole table with print(df).

As @Mad mentioned District is an element and not an attribute, what you can see in the xsd, 1st part of the xml file:

<xsd:sequence>
          <xsd:element name="District" minOccurs="0" od:jetType="decimal" od:sqlSType="decimal">
            <xsd:simpleType>
              <xsd:restriction base="xsd:decimal">
                <xsd:totalDigits value="2"/>
                <xsd:fractionDigits value="0"/>
              </xsd:restriction>
            </xsd:simpleType>
          </xsd:element>
like image 31
Hermann12 Avatar answered Oct 22 '25 00:10

Hermann12



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!