Replace incorrect use of "a" and "an" in text input

Tags:

regex

I am interested in validating or automatically correcting the use of the indefinite articles "a" and "an" in blocks of English text from a textarea.

The grammatical rule is that the choice of article depends on the sound that begins the next word. Details here and here. This appears incredibly broad, however there has been a suggestion in a previous answer (How can I correctly prefix a word with "a" and "an"?) to reference a huge database of English text to create the heuristics to infer the correct indefinite article to use in a given situation. Eamon Nerbonne comments that he has done this, so how can I apply that solution to this practical implementation?

The function I have so far implements the simplest part of the grammatical rule; it uses an when the following word starts with a vowel, and a otherwise. It also respects the existing capitalization of the article. In actual use, though, this isn't practical because the exceptions to that rule are very common. For example, "a horse" is correct while "a honour" and "a HTTP address" are not.

How can my function be expanded to properly handle actual pronunciation of words following the articles, including silent letters, acronyms, and "sometimes-y"? I don't require 100% accuracy - something better than 80% would be enough to improve the text I'm correcting.

Here's my fixArticles() function; see the snippet for a running example.

function fixArticles( txt ) {
  var valTxt = txt.replace(/\b(a|an) (\w*)\b/gim, function( match, article, following ) {
    var newArticle = article.charAt(0);
    switch (following.charAt(0).toLowerCase()) {
      case 'a':
      case 'e':
      case 'i':
      case 'o':
      case 'u':
        newArticle += 'n'; // an
        break;
      default:
        // a
        break;
    }
    if (newArticle !== article) {
      newArticle = "<span class='changed'>" + newArticle + "</span>";
    }
    return newArticle+' '+following;

  });

  document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm,'<br/>');
}

function fixArticles( txt ) {
  var valTxt = txt.replace(/\b(a|an) (\w*)\b/gim, function( match, article, following ) {
    var newArticle = article.charAt(0);
    switch (following.charAt(0).toLowerCase()) {
      case 'a':
      case 'e':
      case 'i':
      case 'o':
      case 'u':
        newArticle += 'n'; // an
        break;
      default:
        // a
        break;
    }
    if (newArticle !== article) {
      newArticle = "<span class='changed'>" + newArticle + "</span>";
    }
    return newArticle+' '+following;

  });
  
  document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm,'<br/>');
}

input, label {
    display:block;
}
.changed {
  font-weight: bold;
}

<label for="input-text">Enter text</label>
<textarea id="input-text" cols="50" rows="5">An wise man once said: "A apple an day keeps the doctor away."
Give me an break.
I would like an apple.
My daughter wants a hippopotamus for Christmas.
It was an honest error.
Did a user click the button?
An MSDS (material safety data sheet) was used to record the data.
</textarea>
<input type="button" value="Fix a/an" onClick="fixArticles(document.getElementById('input-text').value)">
<hr>
<div id="output-text"/>

The expected output for the sample input is:

A wise man once said: "An apple a day keeps the doctor away."
Give me a break.
I would like an apple.
My daughter wants a hippopotamus for Christmas.
It was an honest error.
Did a user click the button?
An MSDS (material safety data sheet) was used to record the data.

757

asked Dec 23 '15 16:12

Mogsdad

1 Answers

Following the flippant answer to How can I correctly prefix a word with "a" and "an"?, Eamon Nerbonne followed the given advice and produced an efficient algorithm that accurately identifies the correct indefinite article to use before any following text. So thanks @JayMEE for the pointer, it did actually help.

Implementation of the algorithm is outside the scope of basic Q & A - you can read about it in Eamon's blog entry and GitHub repository. However, it's dead simple to use!

Here's how fixArticles() can be modified to use the simple, minified version of Eamon's code, AvsAn-simple.min.js. See the JSFiddle Demo.

function fixArticles(txt) {
  var valTxt = txt.replace(/\b(a|an) ([\s\(\"'“‘-]?\w*)\b/gim, function(match, article, following) {
    var input = following.replace(/^[\s\(\"'“‘-]+|\s+$/g, ""); //strip initial punctuation symbols
    var res = AvsAnSimple.query(input);
    var newArticle = res.replace(/^a/i, article.charAt(0));
    if (newArticle !== article) {
      newArticle = "<span class='changed'>" + newArticle + "</span>";
    }
    return newArticle + ' ' + following;
  });

  document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm, '<br/>');
}

119

answered Sep 28 '22 08:09

Mogsdad

Related questions
                            
                                Turn Off Youtube Annotations Using JavaScript
                            
                                Separate event-loop for same-origin CPU intensive iframe
                            
                                Private non-static variables in polymer?
                            
                                Save a date field to Parse : invalid type for key, expected date, but got string
                            
                                Add custom confirm message to standard joomla 3.0 admin toolbar button
                            
                                Angular: force resolve again
                            
                                JS based influxdb graph editors like Grafana, Influga, are there reusable libraries?
                            
                                JQuery plugin issue with IFrame
                            
                                Can browsersync inject updated content in the browser without a refresh?
                            
                                How can I achieve view-model separation in a Javascript component for editing HTML?
                            
                                PDF.js not rendering pdf correctly in IE
                            
                                How can I defer or async javascript in OpenCart
                            
                                Allow infinitescroll.js to run X times, then load more posts
                            
                                before/afterAll() is not defined in jasmine-node
                            
                                Open Android app through deep link if it's installed or fall back to web if not installed
                            
                                Import long html into split PDF
                            
                                Control Netflix player using JavaScript
                            
                                Restricting Kendo Grid to scroll in one direction at a time on touch screen
                            
                                How to scroll left column when cursor is above right column?
                            
                                How to filter data based on two custom filters in Angular js

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replace incorrect use of "a" and "an" in text input

Tags:

javascript

regex

Mogsdad

People also ask

1 Answers

Mogsdad

Recent Activity

Donate For Us