I want to treat data from .tcx file (xml form) between specific tags with Python. File format is like as follows. <pre class="prettyprint"><code> <Track> <Trackpoint> <Time>2015-08-29T22:04:39.000Z</Time> <Position> <LatitudeDegrees>37.198049426078796</LatitudeDegrees> <LongitudeDegrees>127.07204628735781</LongitudeDegrees> </Position> <AltitudeMeters>34.79999923706055</AltitudeMeters> <DistanceMeters>7.309999942779541</DistanceMeters> <HeartRateBpm> <Value>102</Value> </HeartRateBpm> <Cadence>76</Cadence> <Extensions> <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2"> <Watts>112</Watts> </TPX> </Extensions> </Trackpoint> ....Lots of <Trackpoint> ... </Trackpoint> </Track> </code></pre> Eventually, I'll make Data table with columns of 'Lattitude, Altitude, ... Watts'. First I tried to make a list from taged data (like Watts ... /Watts) with BeautifulSoup, xpath etc. But I'm a newbie to deal with these tools. How can I grab data between tags in xml file with Python?

You could use the <code>lxml</code> module, along with <code>XPath</code>. <code>lxml</code> is good for parsing XML/HTML, traversing element trees and returning element text/attributes. You can select particular elements, sets of elements or attributes of elements using <code>XPath</code>. Using your example data: <pre class="prettyprint"><code>content = ''' <Track> <Trackpoint> <Time>2015-08-29T22:04:39.000Z</Time> <Position> <LatitudeDegrees>37.198049426078796</LatitudeDegrees> <LongitudeDegrees>127.07204628735781</LongitudeDegrees> </Position> <AltitudeMeters>34.79999923706055</AltitudeMeters> <DistanceMeters>7.309999942779541</DistanceMeters> <HeartRateBpm> <Value>102</Value> </HeartRateBpm> <Cadence>76</Cadence> <Extensions> <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2"> <Watts>112</Watts> </TPX> </Extensions> </Trackpoint> ....Lots of <Trackpoint> ... </Trackpoint> </Track> ''' from lxml import etree tree = etree.XML(content) time = tree.xpath('Trackpoint/Time/text()') print(time) </code></pre> Output <pre class="prettyprint"><code>['2015-08-29T22:04:39.000Z'] </code></pre>

How can I grab data series from xml or tcx file

Tags:

python

parsing

xml

beautifulsoup

xpath

I want to treat data from .tcx file (xml form) between specific tags with Python.
File format is like as follows.

 <Track>
      <Trackpoint>
        <Time>2015-08-29T22:04:39.000Z</Time>
        <Position>
          <LatitudeDegrees>37.198049426078796</LatitudeDegrees>
          <LongitudeDegrees>127.07204628735781</LongitudeDegrees>
        </Position>
        <AltitudeMeters>34.79999923706055</AltitudeMeters>
        <DistanceMeters>7.309999942779541</DistanceMeters>
        <HeartRateBpm>
          <Value>102</Value>
        </HeartRateBpm>
        <Cadence>76</Cadence>
        <Extensions>
          <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">
            <Watts>112</Watts>
          </TPX>
        </Extensions>
      </Trackpoint>
....Lots of <Trackpoint> ... </Trackpoint>
</Track>

Eventually, I'll make Data table with columns of 'Lattitude, Altitude, ... Watts'.
First I tried to make a list from taged data (like Watts ... /Watts) with BeautifulSoup, xpath etc. But I'm a newbie to deal with these tools. How can I grab data between tags in xml file with Python?

886

asked Sep 10 '15 13:09

Young Dong Kwon

1 Answers

You could use the lxml module, along with XPath. lxml is good for parsing XML/HTML, traversing element trees and returning element text/attributes. You can select particular elements, sets of elements or attributes of elements using XPath. Using your example data:

content = '''
<Track>
      <Trackpoint>
        <Time>2015-08-29T22:04:39.000Z</Time>
        <Position>
          <LatitudeDegrees>37.198049426078796</LatitudeDegrees>
          <LongitudeDegrees>127.07204628735781</LongitudeDegrees>
        </Position>
        <AltitudeMeters>34.79999923706055</AltitudeMeters>
        <DistanceMeters>7.309999942779541</DistanceMeters>
        <HeartRateBpm>
          <Value>102</Value>
        </HeartRateBpm>
        <Cadence>76</Cadence>
        <Extensions>
          <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">
            <Watts>112</Watts>
          </TPX>
        </Extensions>
      </Trackpoint>
....Lots of <Trackpoint> ... </Trackpoint>
</Track>
'''

from lxml import etree

tree = etree.XML(content)
time = tree.xpath('Trackpoint/Time/text()')

print(time)

Output

['2015-08-29T22:04:39.000Z']

answered Oct 04 '22 09:10

gtlambert

Related questions
                            
                                How to create a boxplot not showing the outliers using Python and Plotly?
                            
                                How to use keras for XOR
                            
                                Fast non-negative matrix factorization on large sparse matrix
                            
                                Resizing RGB image with cv2 numpy and Python 2.7
                            
                                Python Bottle multiple file upload
                            
                                Ordering queryset by distance relative to a given position
                            
                                Embedding Seaborn plot in WxPython panel
                            
                                How to correctly add Foreign Key constraints to SQLite DB using SQLAlchemy [duplicate]
                            
                                How to set axvlines to use the same colors from the axes.color_cycle in matplotlib?
                            
                                Construct caffe.Net object using NetParameter
                            
                                How to normalize by another row in a pandas DataFrame?
                            
                                Python - Remove header and footer from docx file
                            
                                Which GTK+ elements support which CSS properties?
                            
                                Can Biopython perform Seq.find() accounting for ambiguity codes
                            
                                BytesIO object to image
                            
                                sqlite3 database is locked
                            
                                Werkzeug and class state with Flask: How are class member variables resetting when the class isn't being reinitialized?
                            
                                Python Regular expression only matches once
                            
                                Scrapy - Retrieve spider object in dupefilter
                            
                                PyLint: Attempting to unpack a non-sequence

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With