Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: parse the first, middle and last name from a fullname field

How do I parse the first, middle, and last name out of a fullname field with SQL?

I need to try to match up on names that are not a direct match on full name. I'd like to be able to take the full name field and break it up into first, middle and last name.

The data does not include any prefixes or suffixes. The middle name is optional. The data is formatted 'First Middle Last'.

I'm interested in some practical solutions to get me 90% of the way there. As it has been stated, this is a complex problem, so I'll handle special cases individually.

like image 785
Even Mien Avatar asked Oct 01 '08 20:10

Even Mien


People also ask

How do I fetch first name and last name in SQL?

Code: SELECT. left(NAME, charindex(' ', NAME) - 1) AS 'FirstName', REVERSE(SUBSTRING(REVERSE(NAME), 1, CHARINDEX(' ', REVERSE(NAME)) - 1)) AS 'LastName'

How do I split a string in SQL?

The STRING_SPLIT(string, separator) function in SQL Server splits the string in the first argument by the separator in the second argument. To split a sentence into words, specify the sentence as the first argument of the STRING_SPLIT() function and ' ' as the second argument. FROM STRING_SPLIT( 'An example sentence.


2 Answers

Here is a self-contained example, with easily manipulated test data.

With this example, if you have a name with more than three parts, then all the "extra" stuff will get put in the LAST_NAME field. An exception is made for specific strings that are identified as "titles", such as "DR", "MRS", and "MR".

If the middle name is missing, then you just get FIRST_NAME and LAST_NAME (MIDDLE_NAME will be NULL).

You could smash it into a giant nested blob of SUBSTRINGs, but readability is hard enough as it is when you do this in SQL.

Edit-- Handle the following special cases:

1 - The NAME field is NULL

2 - The NAME field contains leading / trailing spaces

3 - The NAME field has > 1 consecutive space within the name

4 - The NAME field contains ONLY the first name

5 - Include the original full name in the final output as a separate column, for readability

6 - Handle a specific list of prefixes as a separate "title" column

SELECT   FIRST_NAME.ORIGINAL_INPUT_DATA  ,FIRST_NAME.TITLE  ,FIRST_NAME.FIRST_NAME  ,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)        THEN NULL  --no more spaces?  assume rest is the last name        ELSE SUBSTRING(                        FIRST_NAME.REST_OF_NAME                       ,1                       ,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1                      )        END AS MIDDLE_NAME  ,SUBSTRING(              FIRST_NAME.REST_OF_NAME             ,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)             ,LEN(FIRST_NAME.REST_OF_NAME)            ) AS LAST_NAME FROM   (     SELECT     TITLE.TITLE    ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)          THEN TITLE.REST_OF_NAME --No space? return the whole thing          ELSE SUBSTRING(                          TITLE.REST_OF_NAME                         ,1                         ,CHARINDEX(' ',TITLE.REST_OF_NAME)-1                        )     END AS FIRST_NAME    ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)            THEN NULL  --no spaces @ all?  then 1st name is all we have          ELSE SUBSTRING(                          TITLE.REST_OF_NAME                         ,CHARINDEX(' ',TITLE.REST_OF_NAME)+1                         ,LEN(TITLE.REST_OF_NAME)                        )     END AS REST_OF_NAME    ,TITLE.ORIGINAL_INPUT_DATA   FROM     (        SELECT       --if the first three characters are in this list,       --then pull it as a "title".  otherwise return NULL for title.       CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')            THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3)))            ELSE NULL            END AS TITLE       --if you change the list, don't forget to change it here, too.       --so much for the DRY prinicple...      ,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')            THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME))))            ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME))            END AS REST_OF_NAME      ,TEST_DATA.ORIGINAL_INPUT_DATA     FROM       (       SELECT         --trim leading & trailing spaces before trying to process         --disallow extra spaces *within* the name         REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),'  ',' '),'  ',' ') AS FULL_NAME        ,FULL_NAME AS ORIGINAL_INPUT_DATA       FROM         (         --if you use this, then replace the following         --block with your actual table               SELECT 'GEORGE W BUSH' AS FULL_NAME         UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME         UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME         UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME         UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME         UNION SELECT 'TOMMY' AS FULL_NAME         UNION SELECT 'BILLY' AS FULL_NAME         UNION SELECT NULL AS FULL_NAME         UNION SELECT ' ' AS FULL_NAME         UNION SELECT '    JOHN  JACOB     SMITH' AS FULL_NAME         UNION SELECT ' DR  SANJAY       GUPTA' AS FULL_NAME         UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME         UNION SELECT ' MRS  SUSAN ADAMS' AS FULL_NAME         UNION SELECT ' MS AUGUSTA  ADA   KING ' AS FULL_NAME               ) RAW_DATA       ) TEST_DATA     ) TITLE   ) FIRST_NAME 
like image 151
JosephStyons Avatar answered Sep 23 '22 01:09

JosephStyons


It's difficult to answer without knowing how the "full name" is formatted.

It could be "Last Name, First Name Middle Name" or "First Name Middle Name Last Name", etc.

Basically you'll have to use the SUBSTRING function

SUBSTRING ( expression , start , length ) 

And probably the CHARINDEX function

CHARINDEX (substr, expression) 

To figure out the start and length for each part you want to extract.

So let's say the format is "First Name Last Name" you could (untested.. but should be close) :

SELECT  SUBSTRING(fullname, 1, CHARINDEX(' ', fullname) - 1) AS FirstName,  SUBSTRING(fullname, CHARINDEX(' ', fullname) + 1, len(fullname)) AS LastName FROM YourTable 
like image 27
neonski Avatar answered Sep 23 '22 01:09

neonski