Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FST (Finite-state transducers) Libraries, C++ or java

I have a problem to solve using FSTs. Basically, I'll make a morphological parser, and in this moment i have to work with large transducers. The performance is The Big issue here.

Recently, i worked in c++ in other projects where the performance matters, but now, i'am considering java, because the java's benefits and because java is getting better.

I studied some comparisons between java and c++, but I cannot decide what language i should use for this specific problem because it depends on lib in use.

I can´t find much information about java's libs, so, my question is: Are there any open source java libs in which the performance is good, like The RWTH FSA Toolkit that i read in an article that is the fastest c++ lib?

Thanks all.

like image 315
Cdiniz Avatar asked Nov 03 '10 15:11

Cdiniz


People also ask

What is the difference between an FSA and an FST?

An FSA defines a formal language by defining a set of accepted strings, while an FST defines relations between sets of strings. An FST will read a set of strings on the input tape and generates a set of relations on the output tape. An FST can be thought of as a translator or relater between strings in a set.

What is Finite State Morphology in NLP?

Finite-State Morphology. Morphology is a domain of linguistics that studies the formation of words. It is traditional to distinguish between surface forms and their analyses, called lemmas.

What is WFST?

A finite-state transducer (FST) has arcs labeling the input and output labels. A path through the transducer maps an input label sequence to an output label sequence. WFST is the weighted version of the finite-state transducer.

How do you write a Finite State Transducer?

Definition 1 A Finite State Transducer (FST) is a 5-tuple T = (Q,Σ,Γ, δ, s, γ) where • Q is a finite set of states, • Σ is a finite set of input symbols, • Γ is a finite set of output symbols, • δ:Q × Σ → Q is the transition function, • s ∈ Q is the start state. γ:Q → Γ∗ is the output function.


1 Answers

What are the "benefits" of Java, for your purposes? What specific problem does that platform solve that you need? What is the performance constraint you must consider? Were the "comparisons" fair, because Java is actually extremely difficult to benchmark. So is C++, but you can at least get some algorithmic boundary guarantees from STL.

I suggest you look at OpenFst and the AT&T finite-state transducer tools. There are others out there, but I think your worry about Java puts the cart before the horse-- focus on what solves your problem well.

Good luck!

like image 151
just me Avatar answered Nov 02 '22 04:11

just me