TextGrid is the "segmentation" file used by Praat program. I'd like to write a parser that will then verify the data. My question is:
How would you write a parser for this format? Read it line by line or something else? Is this a known format?
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 93.0538775510204
tiers? <exists>
size = 3
item []:
item [1]:
class = "IntervalTier"
name = "diph"
xmin = 0
xmax = 93.0538775510204
intervals: size = 65
intervals [1]:
xmin = 0
xmax = 1.300090702947846
text = ""
intervals [2]:
xmin = 1.300090702947846
xmax = 1.5300845864661654
text = "ey_s"
intervals [3]:
xmin = 1.5300845864661654
xmax = 3.4648692624493815
text = ""
(This is then repeated to EOF, with intervals[4....n])
TextGrid parser already exists and it is a part of NLTK Toolkit. The Python file is here:
http://nltk.googlecode.com/svn/trunk/nltk_contrib/nltk_contrib/textgrid.py
Updated link: https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/textgrid.py
Automatic Praat's TextGrid File Parser is a small application to parse Praat's textGrid Files. The result of the parsing is a spreadsheet that is saved in a output text file. The output text file can be imported by applications such as Excel. TGP is meant to be a flexible program that can be continuously extended or modified easily, it is currently capable of analyzing certain types of TextGrid files. The version 1.0 of the TGP reads TextGrid files with the following item types: word, phone and optionally focus.
http://tgp.peremila.com/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With