a have an XML file that looks like this: <pre class="prettyprint"><code><?xml version="1.0" encoding="utf-8"?> <comments> <row Id="1" PostId="2" Score="0" Text="(...)" CreationDate="2011-08-30T21:15:28.063" UserId="16" /> <row Id="2" PostId="17" Score="1" Text="(...)" CreationDate="2011-08-30T21:24:56.573" UserId="27" /> <row Id="3" PostId="26" Score="0" Text="(...)" UserId="9" /> </comments> </code></pre> What I'm trying to do is to extract ID, Text and CreationDate colums into pandas DF and I've tryied following: <pre class="prettyprint"><code>import xml.etree.cElementTree as et import pandas as pd path = '/.../...' dfcols = ['ID', 'Text', 'CreationDate'] df_xml = pd.DataFrame(columns=dfcols) root = et.parse(path) rows = root.findall('.//row') for row in rows: ID = row.find('Id') text = row.find('Text') date = row.find('CreationDate') print(ID, text, date) df_xml = df_xml.append(pd.Series([ID, text, date], index=dfcols), ignore_index=True) print(df_xml) </code></pre> But the output is: None None None Could you please tell how to fix this? THanks

Just a minor change in your code <pre class="prettyprint"><code>ID = row.get('Id') text = row.get('Text') date = row.get('CreationDate') </code></pre>

Python: Extracting XML to DataFrame (Pandas)

Tags:

xml

a have an XML file that looks like this:

<?xml version="1.0" encoding="utf-8"?>
<comments>
<row Id="1" PostId="2" Score="0" Text="(...)" CreationDate="2011-08-30T21:15:28.063" UserId="16" />
<row Id="2" PostId="17" Score="1" Text="(...)" CreationDate="2011-08-30T21:24:56.573" UserId="27" />
<row Id="3" PostId="26" Score="0" Text="(...)" UserId="9" />
</comments>

What I'm trying to do is to extract ID, Text and CreationDate colums into pandas DF and I've tryied following:

import xml.etree.cElementTree as et
import pandas as pd
path = '/.../...'
dfcols = ['ID', 'Text', 'CreationDate']
df_xml = pd.DataFrame(columns=dfcols)

root = et.parse(path)
rows = root.findall('.//row')
for row in rows:
    ID = row.find('Id')
    text = row.find('Text')
    date = row.find('CreationDate')
    print(ID, text, date)
    df_xml = df_xml.append(pd.Series([ID, text, date], index=dfcols), ignore_index=True)

print(df_xml)

But the output is: None None None

Could you please tell how to fix this? THanks

537

asked Jun 09 '18 12:06

Video Answer

2 Answers

As advised in this solution by gold member Python/pandas/numpy guru, @unutbu:

Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.

Therefore, consider parsing your XML data into a separate list then pass list into the DataFrame constructor in one call outside of any loop. In fact, you can pass nested lists with list comprehension directly into the constructor:

path = 'AttributesXMLPandas.xml'
dfcols = ['ID', 'Text', 'CreationDate']

root = et.parse(path)
rows = root.findall('.//row')

# NESTED LIST
xml_data = [[row.get('Id'), row.get('Text'), row.get('CreationDate')] 
            for row in rows]

df_xml = pd.DataFrame(xml_data, columns=dfcols)

print(df_xml)

#   ID   Text             CreationDate
# 0  1  (...)  2011-08-30T21:15:28.063
# 1  2  (...)  2011-08-30T21:24:56.573
# 2  3  (...)                     None

162

answered Oct 02 '22 22:10

Parfait

Just a minor change in your code

ID = row.get('Id')
text = row.get('Text')
date = row.get('CreationDate')

answered Oct 02 '22 20:10

Prany

Related questions
                            
                                Find the lowercase (un-shifted) form of symbols
                            
                                SQLAlchemy: Get only one column [duplicate]
                            
                                How to use regex non-capturing groups format in Python
                            
                                Python/Threading/Barrier: Is this a correct usage of Barrier?
                            
                                dragging points in matplotlib interactive plot
                            
                                URL patterns in Django 2
                            
                                Writing a 3D Numpy array to a CSV file
                            
                                memory error when using gensim for loading word2vec
                            
                                How to create an anti-diagonal identity matrix (where the diagonal is flipped left to right) in numpy
                            
                                list comprehension returning "generator object..."
                            
                                Why does `instance_of_object.foo is instance_of_object.foo` evaluate False? [duplicate]
                            
                                Select fields to return from $lookup
                            
                                pandas: how to group by multiple columns and perform different aggregations on multiple columns?
                            
                                Template matching with multiple objects in OpenCV Python
                            
                                ModuleNotFoundError in Python 3 but not 2
                            
                                How to get the selected date for DateEntry in tkcalendar (Python)?
                            
                                Pyautogui screenshot. Where does it go? How to save and find later?
                            
                                Can you get a static external IP address for Google Cloud Composer / Airflow?
                            
                                Extract features into a dataset from keras model
                            
                                How to loop large parquet file with generators in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Extracting XML to DataFrame (Pandas)

Tags:

python

pandas

dataframe

xml

jabba

People also ask

Video Answer

2 Answers

Parfait

Prany

Recent Activity

Donate For Us