Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse this format (Praat TextGrid)

TextGrid is the "segmentation" file used by Praat program. I'd like to write a parser that will then verify the data. My question is:

How would you write a parser for this format? Read it line by line or something else? Is this a known format?

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0 
xmax = 93.0538775510204 
tiers? <exists> 
size = 3 

item []: 
    item [1]:
        class = "IntervalTier" 
        name = "diph" 
        xmin = 0 
        xmax = 93.0538775510204 
        intervals: size = 65 
        intervals [1]:
            xmin = 0 
            xmax = 1.300090702947846 
            text = "" 
        intervals [2]:
            xmin = 1.300090702947846 
            xmax = 1.5300845864661654 
            text = "ey_s" 
        intervals [3]:
            xmin = 1.5300845864661654 
            xmax = 3.4648692624493815 
            text = "" 

(This is then repeated to EOF, with intervals[4....n])

like image 353
marw Avatar asked May 29 '11 12:05

marw


2 Answers

TextGrid parser already exists and it is a part of NLTK Toolkit. The Python file is here:

http://nltk.googlecode.com/svn/trunk/nltk_contrib/nltk_contrib/textgrid.py

Updated link: https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/textgrid.py

like image 162
marw Avatar answered Sep 23 '22 14:09

marw


Automatic Praat's TextGrid File Parser is a small application to parse Praat's textGrid Files. The result of the parsing is a spreadsheet that is saved in a output text file. The output text file can be imported by applications such as Excel. TGP is meant to be a flexible program that can be continuously extended or modified easily, it is currently capable of analyzing certain types of TextGrid files. The version 1.0 of the TGP reads TextGrid files with the following item types: word, phone and optionally focus.

http://tgp.peremila.com/

like image 40
Pere Milà Avatar answered Sep 22 '22 14:09

Pere Milà