Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieve definition for parenthesized abbreviation, based on letter count

I need to retrieve the definition of an acronym based on the number of letters enclosed in parentheses. For the data I'm dealing with, the number of letters in parentheses corresponds to the number of words to retrieve. I know this isn't a reliable method for getting abbreviations, but in my case it will be. For example:

String = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'

Desired output: family health history (FHH), nurse practitioner (NP)

I know how to extract parentheses from a string, but after that I am stuck. Any help is appreciated.

 import re

 a = 'Although family health history (FHH) is commonly accepted as an 
 important risk factor for common, chronic diseases, it is rarely considered 
 by a nurse practitioner (NP).'

 x2 = re.findall('(\(.*?\))', a)

 for x in x2:
    length = len(x)
    print(x, length) 
like image 520
tenebris silentio Avatar asked Jun 02 '19 02:06

tenebris silentio


People also ask

What is it called when you abbreviate with letters?

An acronym is an abbreviation that forms a word. An initialism is an abbreviation that uses the first letter of each word in the phrase (thus, some but not all initialisms are acronyms).

What are the four types of abbreviation?

There are many different kinds of abbreviations, including acronyms, initialisms, portmanteau, truncations and clipped words.

How do you define an acronym in a document?

Abbreviations/AcronymsSpell out the full term at its first mention, indicate its abbreviation in parenthesis and use the abbreviation from then on, with the exception of acronyms that would be familiar to most readers, such as MCC and USAID.


1 Answers

Use the regex match to find the position of the start of the match. Then use python string indexing to get the substring leading up to the start of the match. Split the substring by words, and get the last n words. Where n is the length of the abbreviation.

import re
s = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'


for match in re.finditer(r"\((.*?)\)", s):
    start_index = match.start()
    abbr = match.group(1)
    size = len(abbr)
    words = s[:start_index].split()[-size:]
    definition = " ".join(words)

    print(abbr, definition)

This prints:

FHH family health history
NP nurse practitioner
like image 97
Keatinge Avatar answered Nov 07 '22 07:11

Keatinge