How to drop null values from dynamic loop generated from Python?

Tags:

I have a data-frame like this:

   ORDER_NO         2401        2504         2600
    2020020         2019-12-04  2019-12-10   2019-12-12 
    2020024         2019-12-25  NaN          2019-12-20
    2020034         NaN         NaN          2019-12-20
    2020020         2019-12-12  2019-12-15   2019-12-18

I am creating XML from the above data-frame. I want remove the null value being populated into the XML. My code should drop that particular column and row value from XML.

My code

header = """<ORD>{}</ORD>"""
body ="""
<osi:ORDSTSINF types:STSCDE="{}">
<DTM>{}</DTM>"""

cols = df.columns
for row in df.itertuples():
    with open(f'{row[1]}.xml', 'w') as f:
        f.write(header.format(row[1]))
        for c, r in zip(row[2:], cols[1:]):
            f.write(body.format(r, c))

Current output for record 2

<ORD>2020024</ORD>
<osi:ORDSTSINF types:STSCDE="2401">
<DTM>2019-12-25</DTM>
<osi:ORDSTSINF types:STSCDE="2504">
<DTM>NaN</DTM>
<osi:ORDSTSINF types:STSCDE="2600">
<DTM>2019-12-20</DTM>

Expected output for record 2

 <ORD>2020024</ORD>
    <osi:ORDSTSINF types:STSCDE="2401">
    <DTM>2019-12-25</DTM>
    <osi:ORDSTSINF types:STSCDE="2600">
    <DTM>2019-12-20</DTM>

How can this be done in Python?

335

asked Feb 20 '20 16:02

Ria Alves

1 Answers

`stack`

Naturally drops the nulls

header = """<ORD>{}</ORD>"""
body ="""
<osi:ORDSTSINF types:STSCDE="{}">
<DTM>{}</DTM>"""

for o, d in df.set_index('ORDER_NO').stack().groupby('ORDER_NO'):
    with open(f'{o}.xml', 'w') as f:
        f.write(header.format(o))
        for (o, s), date in d.iteritems():
            f.write(body.format(s, date))

Details

df.set_index('ORDER_NO').stack()

ORDER_NO      
2020020   2401   2019-12-04
          2504   2019-12-10
          2600   2019-12-12
2020024   2401   2019-12-25
          2600   2019-12-20
2020034   2600   2019-12-20
2020020   2401   2019-12-12
          2504   2019-12-15
          2600   2019-12-18

BTW

Your solution would be fine with an if

header = """<ORD>{}</ORD>"""
body ="""
<osi:ORDSTSINF types:STSCDE="{}">
<DTM>{}</DTM>"""

cols = df.columns
for row in df.itertuples():
    with open(f'{row[1]}.xml', 'w') as f:
        f.write(header.format(row[1]))
        for c, r in zip(row[2:], cols[1:]):
            if pd.notna(c):
                f.write(body.format(r, c))

163

answered Oct 16 '22 22:10

piRSquared

Related questions
                            
                                Datetime, pandas, and timezone woes: AttributeError: 'datetime.timezone' object has no attribute '_utcoffset'
                            
                                Insert cells in empty Pandas DataFrame
                            
                                tensorflow 2 api regression tensorflow.python.framework.ops.EagerTensor' object is not callable
                            
                                when I use PIL to paste a crop to another image it raises ValueError
                            
                                POS pattern mining with spacy
                            
                                Should I re-use cursor object or create a new one with mysql.connector?
                            
                                Given a 2D Numpy array representing a 2D distribution, how to sample data from this distribution with the aid of Numpy or Scipy functions?
                            
                                Error: cannot import name 'PDFDocument' from 'pdfminer.pdfparser'
                            
                                Python: Extract dimension data from dataframe string column and create columns with values for each of them
                            
                                Google OR tools - train scheduling problem
                            
                                How to replicate Python 2 style len() in Python 3?
                            
                                Disable Tensorflow logging completely
                            
                                Replacing more than one substring value with pandas str.replace
                            
                                How to Naturally Sort Pathlib objects in Python?
                            
                                Display graph using Tensorflow v2.0 in Tensorboard
                            
                                Rendering latex/mathjax equations in django
                            
                                How can i fix a GeoDjango OSError: undefined Symbol?
                            
                                ImportError: cannot import name 'ClassVar' after installing airflow
                            
                                Read what number the colored number image is to console
                            
                                Optimizing cartesian product between two Pandas Dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to drop null values from dynamic loop generated from Python?

Tags:

loops

python-3.x

pandas

dataframe

itertools