Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a full name into its constituents

We are in need of developing a back end application that can parse a full name into

Prefix (Dr. Mr. Ms. etc)
First Name
Last Name
Middle Name
etc

Challenge here is that it has to support names of multiple countries and languages. One assumption that we have is we will always get a country and language along with the full name as input.

The full name may come in any format. For the same country / language combination, it may come in with first name last name or the reverse. Comma will not be a part of the Full Name.

Is is feasible? We are also open to any commercially available software.

like image 296
prabhu Avatar asked Apr 08 '11 14:04

prabhu


2 Answers

I think this is impossible. Consider Ralph Vaughan Williams. His family name is "Vaughan Williams" and his first name is "Ralph". Contrast this with Charles Villiers Stanford, whose family name is "Stanford", with first name "Charles" and middle name "Villiers".

Both are English-speaking composers from England, so country and language information is not sufficient to establish the correct parsing logic.

like image 192
phoog Avatar answered Oct 21 '22 19:10

phoog


Since the OP was open to any commercially available offering...

The "IBM InfoSphere Global Name Analytics" appears to be a commercial solution satisfying the original request for the parsing of a [free-form unstructured] personal name [full name]; apparently with a degree of certainty in regards to resolving some of the name ambiguity issues alluded to in other responses.
Note: I have no personal experience nor association with the product, I had merely encountered this discussion and the following reference links while re-investigating effectively the same concern as described by the OP. HTH.

A general product documentation link:
http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gna_con_gnaoverview.html

Refer to the "Parsing names using NameParser" at
http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_np_con_parsingnamesusingnameparser.html

The NameParser is a component API for the product per
http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gnm_con_logicalarchitecturecapis.html

Refer to the "Parsing names using IBM NameWorks" at
http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gnm_con_parsingnamesusingnameworks.html

"IBM NameWorks combines the individual IBM InfoSphere Global Name Recognition components into a single, unified, easy-to-use application programming interface (API), and also extends this functionality to Java applications and as a Web service"

http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gnm_con_logicalarchitecturenwapis.html

To clarify why I think this answers the question, ameliorating some of the previous alluded difficulties in accomplishing the task... If I understood correctly what I read, the APIs use the "NameHunter Server" to search the "IBM InfoSphere Global Name Data Archive (NDA)" which is described as "a collection of nearly one billion names from around the world, along with gender and country of association for each name. This large repository of name information powers the algorithms and rules that IBM InfoSphere Global Name Recognition products use to categorize, classify, parse, genderize , and match names."

FWiW I also ran across a "Name Parser" which uses a database of ~140K names as noted at:
http://www.melissadata.com/dqt/websmart-web-services.htm

like image 28
CRPence Avatar answered Oct 21 '22 19:10

CRPence