I have an MS Excel XML (2003) file with the following metadata:
<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?><Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
I'd like to read it into a pandas dataframe. What's a good way to go about doing this? Thanks.
We can use the pandas module read_excel() function to read the excel file data into a DataFrame object. If you look at an excel sheet, it's a two-dimensional table. The DataFrame object also represents a two-dimensional tabular data structure.
xml” file starts at the root of the tree, namely the <data> element, which contains the entire data structure. Now we can iterate through each node of the tree, which means we will get each student element and grab its name attribute and all of its sub-elements to build our dataframe.
Did you try Canopy python's pyxll, it is advertised as "Python for Excel Solution"
Check it out please and see if it solves your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With