Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I parse numeric tables from a text file using templates in Python?

Tags:

python

parsing

I would like to extract a series of tables from a text file. The file looks something like the following. The table heading follows a regular pattern, and there is a blank line at the end of the table. Eventually I want the table in a Numpy array, but if I can get the lines of numeric data isolated, then converting to an array is easy.

Contents of example.txt:

lines to ignore

Table AAA

   -  ----
   1  3.5
   3  6.8
  55  9.933


more lines to ignore
more lines to ignore

Table BBB

   -  ----
   2  5.0
   5  6.8
  99  9.933

even more lines to ignore

(Edit: added spaces before rows in columns)

From this, I would like a list, something like:

[ 
   { 'id' : 'AAA', data : [[1,3.5],[3,6.8],[5,9.933]]},
   { 'id' : 'BBB', data : [[2,5.0],[5,6.8],[99,9.933]]},
]

I have written plenty of one-off parsers for this, but I'd like to do something with templates based on what I've seen in the ttp Python package. Unfortunately for me, that package seems to be focused on networking configuration files, so none of the examples are that close to what I'm wanting to do.

If there is a better Python package to use, I'm open to suggestions.

Here is what I've started with:

import ttp

template = """
<group name="table data" method="table">

Table {{ tab_name }}
{{ x1 | ROW }}

</group>
"""

lines = ''.join(open('example.txt').readlines())

parser = ttp.ttp(data=lines, template=template)
parser.parse()

res = parser.result()
print(res)

But this doesn't separate the tables or ignore the interspersed lines of text.

In [11]: res
Out[11]:
[[{'table data': [{'x1': 'lines to ignore'},
    {'tab_name': 'AAA'},
    {'x1': '-  ----'},
    {'x1': '1  3.5'},
    {'x1': '3  6.8'},
    {'x1': '5  9.933'},
    {'x1': 'more lines to ignore'},
    {'x1': 'more lines to ignore'},
    {'tab_name': 'BBB'},
    {'x1': '-  ----'},
    {'x1': '2  5.0'},
    {'x1': '5  6.8'},
    {'x1': '99  9.933'},
    {'x1': 'even more lines to ignore'}]}]]
like image 455
Josh Hykes Avatar asked Nov 27 '25 04:11

Josh Hykes


1 Answers

this template

<group name="tables*">
Table {{ id }}

<group name="data" itemize="row">
{{ ignore("\s{1,3}") }} {{ row | ROW | exclude("---") | split("  ") }}
</group>

</group>

Gives this result:

[
    {
        "tables": [
            {
                "data": [["1", "3.5"], ["3", "6.8"], ["55", "9.933"]],
                "id": "AAA"
            },
            {
                "data": [["2", "5.0"], ["5", "6.8"], ["99", "9.933"]],
                "id": "BBB"
            }
        ]
    }
]

Which looks fairly close to what was stated as a desirable result in original post.

How it works

  • {{ ignore("\s{1,3}") }} {{ - this makes sure that we only match lines that have 2-4 spaces in front of them, filtering out "lines to ignore"
  • {{ row | ROW | exclude("---") | split(" ") }} - ROW regex captures entire line that has several spaces between items, exclude("---") filters out separator line like "- ---", split(" ") - transforms matched string into a list of items
  • itemize="row" - is a group function that combines row match results into a list

Hope that helps.

P.S. You can play with above template here - https://textfsm.nornir.tech/

like image 67
apraksim Avatar answered Nov 29 '25 17:11

apraksim