Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An algorithm for declension of nouns of Polish/Slavic languages

Attention!! It will help a lot to know Polish or any other natural language with strong flexion, preferably with a case system (like German for instance), to answer this question. In particular, Polish declension system is very similar to systems of other Slavic languages like: Russian, Czech, Serbian etc.

Have a look at this Polish, unfinished, declinator: declinator.com I am planning to extend it to other languages, namely Russian and Latin, but for now I am struggling with Polish.

Besides having a large database of declinations for hundreds of nouns I support declining nouns which do not exist. The best solution I came up with until now is simply checking the endings of the nouns so that they can be declined accordingly.

In my code it comes down to this calculateDeclination method. I call it if the noun is not in the database. The entrails of the method look like this:

 if (areLast2Letters(word, "il"))
        declinator = new KamilDeclinator(word);
 else if (areLast2Letters(word, "sk"))
        declinator = new DyskDeclinator(word);
 else if (isLastLetter(word, 'm'))
        declinator = new RealizmDeclinator(word);

etc. These are only first three of tens of else if clauses this method has.

A code of an exemplary declinator looks like this:

import static declining.utils.StringUtils.*;

public class RealizmDeclinator extends realizm_XuXowiX_XemXieXieDeclinator{

    public RealizmDeclinator(String noun) {
        super(noun);
    }

    @Override
    protected String calculateStem() {
        return word;
    }

    @Override
    public String calculateLocative() {
        return swap2ndFromEnd(stem, "ź") + "ie";
    }

    @Override
    public String calculateVocative() {
        return swap2ndFromEnd(stem, "ź") + "ie";
    }
}

So here is the question, is there any other, more elegant algorithm for declining Polish words? Does it have to have so many if else clauses? Do I have to write so many declinators for each type of noun?

This problem showed me how simple and incredibly numerous are Polish declension rules. It made my algorithm boring and monotonous. Hopefully, one of you can help me make it interesting and concise!

Cheers

like image 244
GA1 Avatar asked May 26 '16 21:05

GA1


2 Answers

Despite myself being a native Polish speaker, my answer will pertain to code patterns in your program. As others have pointed out, tables are the way to go. However, you may try refactoring long if/else blocks using the Command pattern. See this page for a diagram.

like image 140
lukeg Avatar answered Oct 19 '22 20:10

lukeg


I believe, the right way to do that is to reproduce an algorithm (with many utility functions and conditions) from a good morphology book and then to polish it on a large dictionary as a unit test.

Updated link to my Russian declination library: https://github.com/georgy7/RussianNounsJS

like image 24
Gosha U. Avatar answered Oct 19 '22 21:10

Gosha U.