Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with English contractions programmatically [Regex, JS, Ruby]

I am capturing natural language user input and I need to check it against a predefined "correct" version. This much is trivial, but I am unsure about how to handle variations in contractions in the English language.

Suppose I'm expecting the sentence I'm positive you don't know what you're doing. The match needs to be exact, but I don't want to lock users into just one variation, as that would get frustrating fast.

So, should I manually enter every possible variation of that sentence as valid matches? Like so:

"I'm positive you don't know what you're doing."
"I am positive you don't know what you're doing."
"I am positive you do not know what you're doing."
"I am positive you do not know what you are doing."
"I'm positive you don't know what you are doing."
...

Etc, etc. Think of more complex sentences and you can see how maddening this gets.

Or, is there a programmatic way I could handle this? With Regex, JS, Ruby, or Rails (the tools I'm using)?

Any help appreciated, thanks.

like image 377
San Diago Avatar asked Apr 09 '17 01:04

San Diago


1 Answers

There can't be that many English contractions. I would store each variation as a key that points to the same value, like (pseudo Ruby-esque but of course could be done with JS)

"aren't"  => :arent
"are not" => :arent 
etc.

Then store the correct sentence using the shared values.

":im positive you :dont know what :youre doing"

When you receive an input, replace matched keys with their stored value, then check the converted sentence against the correct one, stored with the specially marked contractions.

(Note: for the few cases you might like to respond individually to different phrases with identical contractions, make special provisions.)

like image 83
גלעד ברקן Avatar answered Nov 11 '22 11:11

גלעד ברקן