I am interested in validating or automatically correcting the use of the indefinite articles "a" and "an" in blocks of English text from a textarea
.
The grammatical rule is that the choice of article depends on the sound that begins the next word. Details here and here. This appears incredibly broad, however there has been a suggestion in a previous answer (How can I correctly prefix a word with "a" and "an"?) to reference a huge database of English text to create the heuristics to infer the correct indefinite article to use in a given situation. Eamon Nerbonne comments that he has done this, so how can I apply that solution to this practical implementation?
The function I have so far implements the simplest part of the grammatical rule; it uses an when the following word starts with a vowel, and a otherwise. It also respects the existing capitalization of the article. In actual use, though, this isn't practical because the exceptions to that rule are very common. For example, "a horse" is correct while "a honour" and "a HTTP address" are not.
How can my function be expanded to properly handle actual pronunciation of words following the articles, including silent letters, acronyms, and "sometimes-y"? I don't require 100% accuracy - something better than 80% would be enough to improve the text I'm correcting.
Here's my fixArticles()
function; see the snippet for a running example.
function fixArticles( txt ) {
var valTxt = txt.replace(/\b(a|an) (\w*)\b/gim, function( match, article, following ) {
var newArticle = article.charAt(0);
switch (following.charAt(0).toLowerCase()) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
newArticle += 'n'; // an
break;
default:
// a
break;
}
if (newArticle !== article) {
newArticle = "<span class='changed'>" + newArticle + "</span>";
}
return newArticle+' '+following;
});
document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm,'<br/>');
}
function fixArticles( txt ) {
var valTxt = txt.replace(/\b(a|an) (\w*)\b/gim, function( match, article, following ) {
var newArticle = article.charAt(0);
switch (following.charAt(0).toLowerCase()) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
newArticle += 'n'; // an
break;
default:
// a
break;
}
if (newArticle !== article) {
newArticle = "<span class='changed'>" + newArticle + "</span>";
}
return newArticle+' '+following;
});
document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm,'<br/>');
}
input, label {
display:block;
}
.changed {
font-weight: bold;
}
<label for="input-text">Enter text</label>
<textarea id="input-text" cols="50" rows="5">An wise man once said: "A apple an day keeps the doctor away."
Give me an break.
I would like an apple.
My daughter wants a hippopotamus for Christmas.
It was an honest error.
Did a user click the button?
An MSDS (material safety data sheet) was used to record the data.
</textarea>
<input type="button" value="Fix a/an" onClick="fixArticles(document.getElementById('input-text').value)">
<hr>
<div id="output-text"/>
The expected output for the sample input is:
A wise man once said: "An apple a day keeps the doctor away."
Give me a break.
I would like an apple.
My daughter wants a hippopotamus for Christmas.
It was an honest error.
Did a user click the button?
An MSDS (material safety data sheet) was used to record the data.
We call the the definite article and a/an the indefinite article. For example, if I say, "Let's read the book," I mean a specific book. If I say, "Let's read a book," I mean any book rather than a specific book.
"A" goes before words that begin with consonants. "An" goes before words that begin with vowels: an apricot. an egg.
No article is used when a plural countable noun is generic or nonspecific. No article is used when a noncount noun is generic or nonspecific.
Use “a” before words where you pronounce the letter “H” such as “a hat,” “a house” or “a happy cat.” Use “an” before words where you don't pronounce the letter “H” such as “an herb,” “an hour,” or “an honorable man.”
Following the flippant answer to How can I correctly prefix a word with "a" and "an"?, Eamon Nerbonne followed the given advice and produced an efficient algorithm that accurately identifies the correct indefinite article to use before any following text. So thanks @JayMEE for the pointer, it did actually help.
Implementation of the algorithm is outside the scope of basic Q & A - you can read about it in Eamon's blog entry and GitHub repository. However, it's dead simple to use!
Here's how fixArticles()
can be modified to use the simple, minified version of Eamon's code, AvsAn-simple.min.js
. See the JSFiddle Demo.
function fixArticles(txt) {
var valTxt = txt.replace(/\b(a|an) ([\s\(\"'“‘-]?\w*)\b/gim, function(match, article, following) {
var input = following.replace(/^[\s\(\"'“‘-]+|\s+$/g, ""); //strip initial punctuation symbols
var res = AvsAnSimple.query(input);
var newArticle = res.replace(/^a/i, article.charAt(0));
if (newArticle !== article) {
newArticle = "<span class='changed'>" + newArticle + "</span>";
}
return newArticle + ' ' + following;
});
document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm, '<br/>');
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With