I am searching for information on algorithms to process text sentences or to follow a structure when creating sentences that are valid in a normal human language such as English. I would like to know if there are projects working in this field that I can go learn from or start using.
For example, if I gave a program a noun, provided it with a thesaurus (for related words) and part-of-speech (so it understood where each word belonged in a sentence) - could it create a random, valid sentence?
I'm sure there are many sub-sections of this kind of research so any leads into this would be great.
As written communication expectations have increased, automation has become a common practice in digital spaces. Moreover, AI can quickly generate unique sentences, so you don't have to spend too much time working on them manually.
Artificial intelligence sentence example. The calculators use so-called " artificial intelligence " and artificial neural networks to make the predictions. The level of the toy's artificial intelligence means it's equipped with the ability to make over 40 noises in response to stimuli, making it quite entertaining.
Regression Algorithms These types of algorithms are used to predict future outcomes based on a set of input data.
Microsoft and Cambridge University researchers have developed artificial intelligence that can write code and called it DeepCoder. The tool can write working code after searching through a huge code database.
The field you're looking for is called natural language generation, a subfield of natural language processing http://en.wikipedia.org/wiki/Natural_language_processing
Sentence generation is either really easy or really hard depending on how good you want the sentences to be. Currently, there aren't programs that will be able to generate 100% sensible sentences about given nouns (even with a thesaurus) -- if that is what you mean.
If, on the other hand, you would be satisfied with nonsense that was sometimes ungrammatical, then you could try an n-gram based sentence generator. These just chain together of words that tend to appear in sequence, and 3-4-gram generators look quite okay sometimes (although you'll recognize them as what generates a lot of spam email).
Here's an intro to the basics of n-gram based generation, using NLTK: http://www.nltk.org/book/ch02.html#generating-random-text-with-bigrams
This is called NLG (Natural Language Generation), although that is mainly the task of generating text that describes a set of data. There is also a lot of research on completely random sentence generation as well.
One starting point is to use Markov chains to generate sentences. How this is done is that you have a transition matrix that says how likely it is to transition between every every part-of-speech. You also have the most likely starting and ending part-of-speech of a sentence. Put this all together and you can generate likely sequences of parts-of-speech.
Now, you are far from done, this will first of all not offer a very good result as you are only considering the probability between adjacent words (also called bi-grams), so what you want to do is to extend this to look for instance at the transition matrix between three parts-of-speech (this makes a 3D matrix and gives you trigrams). You can extend it to 4-grams, 5-grams, etc. depending on the processing power and if your corpus can fill such matrix.
Lastly, you need to patch up things such as object agreement (subject-verb-agreement, adjective-verb-agreement (not in English though), etc.) and tense, so that everything is congruent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With