I have to check if a string can be derived from a given context free that is in Chomsky normal form. I'm using C++.
There is very nice pseudocode on the Wikipedia article covering the CYK algorithm, but I can't understand it very well.
Would someone be so kind to help me out by giving me another pseudocode for CYK algorithm, or maybe explain the one in the wiki article?
CYK algorithm is used to find whether the given Context free grammar generates a given string or not. S is present in the last cell so the string is valid.
Cocke-Younger-Kasami Algorithm The algorithm is based on the principle that the solution to problem [i, j] can constructed from solution to subproblem [i, k] and solution to sub problem [k, j]. The algorithm requires the Grammar G to be in Chomsky Normal Form (CNF).
The CYK algorithm takes as input a CFG that's in Chomsky normal form. That means that every production either has the form
Now, imagine you have a string w and you want to see whether you can derive it from a grammar whose start symbol is S. There are two options:
Notice that option (2) here ends up being a recursive parsing problem: to see whether you can derive w from S, see whether you can derive x from A and y from B.
With that insight, here's pseudocode for recursive function you can use to see whether a nonterminal S derives a string w:
bool canDerive(nonterminal S, string w) {
return canDeriveRec(S, w, 0, w.size());
}
/* Can you derive the substring [start, end) of w from S? */
bool canDeriveRec(nonterminal S, string w, int start, int end) {
/* Base case: Single characters */
if (end - start == 1) {
return whether there is a production S -> a, where a = w[start];
}
/* Recursive case: Try all possible splits */
for (each production S -> AB) {
for (int mid = start + 1; mid < end; mid++) {
if (canDeriveRec(A, w, start, mid) &&
canDeriveRec(B, w, mid, end)) {
return true;
}
}
}
return false;
}
This algorithm works correctly, but if you map out the shape of the recursion you'll find that
In fact, the number of distinct possible calls is O(n2 N), where n is the length of the input string (for each possible combination of a start and end index) and N is the number of nonterminals in the grammar. These observations suggest that this algorithm would benefit either from memoization or dynamic programming, depending on which approach you happen to think is nicer.
The CYK algorithm is what you get when you take the above recursive algorithm and memoize the result, or equivalently when you convert the above recursive algorithm into a dynamic programming problem.
There are O(n2 N) possible recursive calls. For each production tried, it does O(n) work. If there are P productions, on average, for a nonterminal, this means the overall runtime is O(n3 NP), which is O(n3) for a fixed grammar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With