I have a data-frame like this:
ORDER_NO 2401 2504 2600
2020020 2019-12-04 2019-12-10 2019-12-12
2020024 2019-12-25 NaN 2019-12-20
2020034 NaN NaN 2019-12-20
2020020 2019-12-12 2019-12-15 2019-12-18
I am creating XML from the above data-frame. I want remove the null value being populated into the XML. My code should drop that particular column and row value from XML.
My code
header = """<ORD>{}</ORD>"""
body ="""
<osi:ORDSTSINF types:STSCDE="{}">
<DTM>{}</DTM>"""
cols = df.columns
for row in df.itertuples():
with open(f'{row[1]}.xml', 'w') as f:
f.write(header.format(row[1]))
for c, r in zip(row[2:], cols[1:]):
f.write(body.format(r, c))
Current output for record 2
<ORD>2020024</ORD>
<osi:ORDSTSINF types:STSCDE="2401">
<DTM>2019-12-25</DTM>
<osi:ORDSTSINF types:STSCDE="2504">
<DTM>NaN</DTM>
<osi:ORDSTSINF types:STSCDE="2600">
<DTM>2019-12-20</DTM>
Expected output for record 2
<ORD>2020024</ORD>
<osi:ORDSTSINF types:STSCDE="2401">
<DTM>2019-12-25</DTM>
<osi:ORDSTSINF types:STSCDE="2600">
<DTM>2019-12-20</DTM>
How can this be done in Python?
By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .
Drop all rows having at least one null valueDataFrame. dropna() method is your friend. When you call dropna() over the whole DataFrame without specifying any arguments (i.e. using the default behaviour) then the method will drop all rows with at least one missing value.
stack
Naturally drops the nulls
header = """<ORD>{}</ORD>"""
body ="""
<osi:ORDSTSINF types:STSCDE="{}">
<DTM>{}</DTM>"""
for o, d in df.set_index('ORDER_NO').stack().groupby('ORDER_NO'):
with open(f'{o}.xml', 'w') as f:
f.write(header.format(o))
for (o, s), date in d.iteritems():
f.write(body.format(s, date))
df.set_index('ORDER_NO').stack()
ORDER_NO
2020020 2401 2019-12-04
2504 2019-12-10
2600 2019-12-12
2020024 2401 2019-12-25
2600 2019-12-20
2020034 2600 2019-12-20
2020020 2401 2019-12-12
2504 2019-12-15
2600 2019-12-18
Your solution would be fine with an if
header = """<ORD>{}</ORD>"""
body ="""
<osi:ORDSTSINF types:STSCDE="{}">
<DTM>{}</DTM>"""
cols = df.columns
for row in df.itertuples():
with open(f'{row[1]}.xml', 'w') as f:
f.write(header.format(row[1]))
for c, r in zip(row[2:], cols[1:]):
if pd.notna(c):
f.write(body.format(r, c))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With