I have the code below:
import nltk
exampleArray = ['The dog barking']
def processLanguage():
for item in exampleArray:
tokenized = nltk.word_tokenize(item)
tagged = nltk.pos_tag(tokenized)
print(tagged)
processLanguage()
The output of the code above are the tokenized words with their corresponding parts of speech. Example :
[('The', 'DT'), ('dog', 'NN'), ('barking', 'NN'), ('.', '.')]
DT = determiner
NN = noun
The text is supposed to be
The dog is barking
and supposed to have the POS sequence of
DT -> NN -> VBZ -> VBG
VBZ = verb, present tense, 3rd person singular
VBG = verb, present participle or gerund
How will I make the program locate within the sentence the position of the missing word?
This is straight-foward grammar checking. You need at least a tagger, a tool which annotates part of speech tagging (POS), and a parser, best something like Early parser (https://en.wikipedia.org/wiki/Earley_parser) or something else, which is capable of analysing the tree structure given a phrase structure grammar (PSG) of your target language. Indifferent to what specific algorithm you choose, always keep in mind that natural language is at least weakly context-sensitive in the chosmky hierarchy, so forget about finite state automatons etc. If the parser does not validate your sentence as grammatical (in linguistic terms its not licensed by your PSG), you may use the tree structure to locate the position which is not employed or incorrectly employed by some terminal symbol. Another additional thing you have to do is morphological and case-marking, which allows for checking faults in agreement of verbs and arguments etc. in order to rule out sentences like "the dog are barking". Maybe also have a look at LFG or HPSG implementations, which realize this in a more thorough way, since they are computationally more powerful (context-sensitive tools, in other words a linear bounded turing machine).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With