Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to remove periods in acronyms?

Tags:

python

regex

I want to remove periods in acronyms from a string of text, but I also want o leave regular periods (at the end of a sentence for example) in tact.

So the following sentence:

"The C.I.A. is a department in the U.S. Government."

Should become

"The CIA is a department in the US Government."

Is there a clean way to do this using Python? So far I have a two step process:

words = "The C.I.A. is a department in the U.S. Government."
words = re.sub(r'([A-Z].[A-Z.]*)\.', r'\1', words)
print words
# The C.I.A is a department in the U.S Government.    
words = re.sub(r'\.([A-Z])', r'\1', words)
print words
# The CIA is a department in the US Government.
like image 756
mgoldwasser Avatar asked Oct 22 '16 20:10

mgoldwasser


People also ask

Do you need periods for acronyms?

Abbreviations/Acronyms Abbreviations and acronyms are used to save space and to avoid distracting the reader. Acronyms that abbreviate three or more words are usually written without periods (exception is U.S.S.R.). Abbreviations should only be used if the organization or term appears two or more times in the text.

Do periods go at the end of acronyms?

In American English, we always put a period after an abbreviation; it doesn't matter whether the abbreviation is the first two letters of the word (as in Dr. for Drive) or the first and last letter (as in Dr. for Doctor).

What abbreviations do not use periods?

The current style is to use periods with most lowercase and mixed-case abbreviations (examples: a.m., etc., vol., Inc., Jr., Mrs., Tex.) and to omit periods with most uppercase abbreviations (examples: FBI, IRS, ATM, NATO, NBC, TX).


1 Answers

Probably this?

>>> re.sub(r'(?<!\w)([A-Z])\.', r'\1', s)
'The CIA is a department in the US Government.'

Replace single dots that are preceded by an uppercase single letter provided the single letter is not immediately preceded by anything in the \w character set. The later criterion is enforced by the negative lookbehind assertion - (?<!\w).

like image 133
Moses Koledoye Avatar answered Sep 21 '22 00:09

Moses Koledoye