Taking an example from the Introduction to Latin Wikiversity, consider the sentence:
the sailor gives the girl money
We can handle this in Prolog with a DCG fairly elegantly with this pile of rules:
sentence(s(NP, VP)) --> noun_phrase(NP), verb_phrase(VP).
noun_phrase(Noun) --> det, noun(Noun).
noun_phrase(Noun) --> noun(Noun).
verb_phrase(vp(Verb, DO, IO)) --> verb(Verb), noun_phrase(IO), noun_phrase(DO).
det --> [the].
noun(X) --> [X], { member(X, [sailor, girl, money]) }.
verb(gives) --> [gives].
And we see that this works:
?- phrase(sentence(S), [the,sailor,gives,the,girl,money]).
S = s(sailor, vp(gives, money, girl)) ;
It seems to me that the DCG is really optimized for handling word-order languages. I'm at a complete loss as to how to handle this Latin sentence:
nauta dat pecuniam puellae
This means the same thing (the sailor gives the girl money), but the word order is completely free: all of these permutations also mean exactly the same thing:
nauta dat puellae pecuniam
nauta puellae pecuniam dat
puellae pecuniam dat nauta
puellae pecuniam nauta dat
dat pecuniam nauta puellae
The first thing that occurs to me is to enumerate the permutations:
sentence(s(NP, VP)) --> noun_phrase(NP), verb_phrase(VP).
sentence(s(NP, VP)) --> verb_phrase(VP), noun_phrase(NP).
but this won't do, because while nauta
belongs to the subject noun phrase, puellae
which belongs to the object noun phrase is subordinate to the verb, but can precede it. I wonder if I should approach it by building some kind of attributed list first like so:
?- attributed([nauta,dat,pecuniam,puellae], Attributed)
Attributed = [noun(nauta,nom), verb(do,3,s), noun(pecunia,acc), noun(puella,dat)]
This seems like it will turn out to be necessary (and I don't see a good way to do it), but grammatically it's pushing food around on my plate. Maybe I could write a parser with some kind of horrifying non-DCG contraption like this:
parse(s(NounPhrase, VerbPhrase), Attributed) :-
parse(subject_noun_phrase(NounPhrase, Attributed)),
parse(verb_phrase(VerbPhrase, Attributed)).
parse(subject_noun_phrase(Noun), Attributed) :-
member(noun(Noun,nom), Attributed).
parse(object_noun_phrase(Noun), Attributed) :-
member(noun(Noun,acc), Attributed)
This seems like it would work, but only as long as I have no recursion; as soon as I introduce a subordinate clause I'm going to reuse subjects in an unhealthy way.
I just don't see how to get from a non-word-order sentence to a parse tree. Is there a book that discusses this? Thanks.
Here I found a related resource (PERMUTATIONAL GRAMMAR FOR FREE WORD ORDER LANGUAGES). Seems worth to read (Hey, we all hated so much those mandatory Latin lessons, back in 60s !).
In appendix there is an implementation to test.
I forgot to point out Covington' free-word-order parser (it's just a sketch...) You can find in PRoNTo toolkit (I report here for sake of completeness, but I'm fairly sure you already know about it).
Seems like (drawing from my extremely rusty memory of high school Latin), your lexical analyzer needs to look at each token (word) and attribute each token with appropriate meta-data:
Then your parse should be guided by the metadata, since that's what ties everything together.
You could use this meta clause:
unsorted([]) --> [].
unsorted([H|T]) -->
H, unsorted(T).
unsorted([H|T]) -->
unsorted(T), H.
sentence(s(NP, VP)) --> unsorted([noun_phrase(NP), verb_phrase(VP)]).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With