I would like to extract a series of tables from a text file. The file looks something like the following. The table heading follows a regular pattern, and there is a blank line at the end of the table. Eventually I want the table in a Numpy array, but if I can get the lines of numeric data isolated, then converting to an array is easy.
Contents of example.txt:
lines to ignore
Table AAA
- ----
1 3.5
3 6.8
55 9.933
more lines to ignore
more lines to ignore
Table BBB
- ----
2 5.0
5 6.8
99 9.933
even more lines to ignore
(Edit: added spaces before rows in columns)
From this, I would like a list, something like:
[
{ 'id' : 'AAA', data : [[1,3.5],[3,6.8],[5,9.933]]},
{ 'id' : 'BBB', data : [[2,5.0],[5,6.8],[99,9.933]]},
]
I have written plenty of one-off parsers for this, but I'd like to do something with templates based on what I've seen in the ttp Python package. Unfortunately for me, that package seems to be focused on networking configuration files, so none of the examples are that close to what I'm wanting to do.
If there is a better Python package to use, I'm open to suggestions.
Here is what I've started with:
import ttp
template = """
<group name="table data" method="table">
Table {{ tab_name }}
{{ x1 | ROW }}
</group>
"""
lines = ''.join(open('example.txt').readlines())
parser = ttp.ttp(data=lines, template=template)
parser.parse()
res = parser.result()
print(res)
But this doesn't separate the tables or ignore the interspersed lines of text.
In [11]: res
Out[11]:
[[{'table data': [{'x1': 'lines to ignore'},
{'tab_name': 'AAA'},
{'x1': '- ----'},
{'x1': '1 3.5'},
{'x1': '3 6.8'},
{'x1': '5 9.933'},
{'x1': 'more lines to ignore'},
{'x1': 'more lines to ignore'},
{'tab_name': 'BBB'},
{'x1': '- ----'},
{'x1': '2 5.0'},
{'x1': '5 6.8'},
{'x1': '99 9.933'},
{'x1': 'even more lines to ignore'}]}]]
this template
<group name="tables*">
Table {{ id }}
<group name="data" itemize="row">
{{ ignore("\s{1,3}") }} {{ row | ROW | exclude("---") | split(" ") }}
</group>
</group>
Gives this result:
[
{
"tables": [
{
"data": [["1", "3.5"], ["3", "6.8"], ["55", "9.933"]],
"id": "AAA"
},
{
"data": [["2", "5.0"], ["5", "6.8"], ["99", "9.933"]],
"id": "BBB"
}
]
}
]
Which looks fairly close to what was stated as a desirable result in original post.
How it works
{{ ignore("\s{1,3}") }} {{ - this makes sure that we only match lines that have 2-4 spaces in front of them, filtering out "lines to ignore"{{ row | ROW | exclude("---") | split(" ") }} - ROW regex captures entire line that has several spaces between items, exclude("---") filters out separator line like "- ---", split(" ") - transforms matched string into a list of itemsitemize="row" - is a group function that combines row match results into a listHope that helps.
P.S. You can play with above template here - https://textfsm.nornir.tech/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With