Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I correctly prefix a word with "a" and "an"?

I have a .NET application where, given a noun, I want it to correctly prefix that word with "a" or "an". How would I do that?

Before you think the answer is to simply check if the first letter is a vowel, consider phrases like:

  • an honest mistake
  • a used car
like image 323
ryeguy Avatar asked Aug 17 '09 14:08

ryeguy


People also ask

How do you use an and a correctly?

If the first letter makes a vowel-type sound, you use "an"; if the first letter would make a consonant-type sound, you use "a." However, even if you follow these basic rules when deciding to use "a" or "an," remember that there are some exceptions to these rules. "A" goes before words that begin with consonants.

How do I add an and an?

The two indefinite articles in English are a and an. The indefinite article an is used to make pronunciation easier when reading a text aloud. The general rule is to use a when the indefinite article precedes a word beginning with a consonant sound and an when it precedes a word starting with a vowel sound.

What to use before one a or an?

The basic rule for using the articles “a” or “an” is that we use “a” before words beginning with a consonant and use “an” before words beginning with a vowel, so which would you place before “one”? “A one” is correct because “one” starts with a “w” sound.

Which is correct a umbrella or an umbrella?

The U in umbrella is pronounced as a vowel sound ( Λ using the phonetic alphabet) and so we use 'an'. We therefore say 'an umbrella'. This rule also applies to the use of consonants.


1 Answers

  1. Download Wikipedia
  2. Unzip it and write a quick filter program that spits out only article text (the download is generally in XML format, along with non-article metadata too).
  3. Find all instances of a(n).... and make an index on the following word and all of its prefixes (you can use a simple suffixtrie for this). This should be case sensitive, and you'll need a maximum word-length - 15 letters?
  4. (optional) Discard all those prefixes which occur less than 5 times or where "a" vs. "an" achieves less than 2/3 majority (or some other threshholds - tweak here). Preferably keep the empty prefix to avoid corner-cases.
  5. You can optimize your prefix database by discarding all those prefixes whose parent shares the same "a" or "an" annotation.
  6. When determining whether to use "A" or "AN" find the longest matching prefix, and follow its lead. If you didn't discard the empty prefix in step 4, then there will always be a matching prefix (namely the empty prefix), otherwise you may need a special case for a completely-non matching string (such input should be very rare).

You probably can't get much better than this - and it'll certainly beat most rule-based systems.

Edit: I've implemented this in JS/C#. You can try it in your browser, or download the small, reusable javascript implementation it uses. The .NET implementation is package AvsAn on nuget. The implementations are trivial, so it should be easy to port to any other language if necessary.

Turns out the "rules" are quite a bit more complex than I thought:

  • it's an unanticipated result but it's a unanimous vote
  • it's an honest decision but a honeysuckle shrub
  • Symbols: It's an 0800 number, or an ∞ of oregano.
  • Acronyms: It's a NASA scientist, but an NSA analyst; a FIAT car but an FAA policy.

...which just goes to underline that a rule based system would be tricky to build!

like image 185
Eamon Nerbonne Avatar answered Oct 06 '22 00:10

Eamon Nerbonne