Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you even give an (openFST-made) FST input? Where does the output go?

Tags:

Before I start, note that I'm using the linux shell (via using subprocess.call() from Python), and I am using openFST.

I've been sifting through documents and questions about openFST, but I cannot seem to find an answer to this question: how does one actually give input to an openFST-defined, compiled and composed FST? Where does the output go? Do I simply execute 'fstproject'? If so, how would I, say, give it a string to transduce, and print the various transductions when the end-state(s) have been reached?

I apologize if this question seems obvious. I'm not very familiar with openFST as of yet.

like image 945
Sterling Avatar asked Feb 22 '12 07:02

Sterling


1 Answers

One way is to create your machine that performs the transformation. A very simple example would be to upper case a string.

M.wfst

0 0 a A 0 0 b B 0 0 c C 0 

The accompanying symbols file contains a line for for each symbols of the alphabet. Note 0 is reserved for null (epsilon) transitions and has special meaning in many of the operations.

M.syms

<epsilon> 0 a 1 b 2 c 3 A 4 B 5 C 6 

Then compile the machine

fstcompile --isymbols=M.syms --osymbols=M.syms M.wfst > M.ofst 

For an input string "abc" create a linear chain automata, this is a left-to-right chain with an arc for each character. This is an acceptor so we only need a column for the input symbols.

I.wfst

0 1 a 1 2 b 2 3 c 3   

Compile as an acceptor

fstcompile --isymbols=M.syms --acceptor I.wfst > I.ofst 

Then compose the machines and print

fstcompose I.ofst M.ofst | fstprint --isymbols=M.syms --osymbols=M.syms  

This will give the output

0   1   a   A 1   2   b   B 2   3   c   C 3 

The output of fstcompose is a lattice of all transductions of the input string. (In this case there is only one). If M.ofst is more complicated fstshortestpath can be used to extract n-strings using the flags --unique -nshortest=n. This output is again a transducer, you could either scrap the output of fstprint, or use C++ code and the OpenFst library to run depth first search to extract the strings.

Inserting fstproject --project_output will convert the output to an acceptor containing only the output labels.

fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --osymbols=M.syms  

Gives the following

0  1  A  A 1  2  B  B 2  3  C  C 3 

This is an acceptor because the input and output labels are the same, the --acceptor options can be used to generate more succinct output.

 fstcompose I.ofst M.ofst | fstproject --project_output |  fstprint --isymbols=M.syms --acceptor 
like image 108
Paul Dixon Avatar answered Oct 11 '22 19:10

Paul Dixon