I have a problem to solve using FSTs. Basically, I'll make a morphological parser, and in this moment i have to work with large transducers. The performance is The Big issue here.
Recently, i worked in c++ in other projects where the performance matters, but now, i'am considering java, because the java's benefits and because java is getting better.
I studied some comparisons between java and c++, but I cannot decide what language i should use for this specific problem because it depends on lib in use.
I can´t find much information about java's libs, so, my question is: Are there any open source java libs in which the performance is good, like The RWTH FSA Toolkit that i read in an article that is the fastest c++ lib?
Thanks all.
An FSA defines a formal language by defining a set of accepted strings, while an FST defines relations between sets of strings. An FST will read a set of strings on the input tape and generates a set of relations on the output tape. An FST can be thought of as a translator or relater between strings in a set.
Finite-State Morphology. Morphology is a domain of linguistics that studies the formation of words. It is traditional to distinguish between surface forms and their analyses, called lemmas.
A finite-state transducer (FST) has arcs labeling the input and output labels. A path through the transducer maps an input label sequence to an output label sequence. WFST is the weighted version of the finite-state transducer.
Definition 1 A Finite State Transducer (FST) is a 5-tuple T = (Q,Σ,Γ, δ, s, γ) where • Q is a finite set of states, • Σ is a finite set of input symbols, • Γ is a finite set of output symbols, • δ:Q × Σ → Q is the transition function, • s ∈ Q is the start state. γ:Q → Γ∗ is the output function.
What are the "benefits" of Java, for your purposes? What specific problem does that platform solve that you need? What is the performance constraint you must consider? Were the "comparisons" fair, because Java is actually extremely difficult to benchmark. So is C++, but you can at least get some algorithmic boundary guarantees from STL.
I suggest you look at OpenFst and the AT&T finite-state transducer tools. There are others out there, but I think your worry about Java puts the cart before the horse-- focus on what solves your problem well.
Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With