Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Anyone know an elegant function to fix name cases?

Tags:

php

Kindergarten 101 teaches some of us that: "The letters in your name should be lowercase, with uppercase first letters." Yet in this post-literate era, how people enter their names in web forms seems to depend on their mood, or solar flares or whatnot: All uppercase, all lowercase, mixed, upside down...

Philosophically, I say whatever! Occupy your name, who cares. But I have OCD clients that prefer to see data normalized, standardized, predictable. So I'm asking you guys if you've seen any well-thought-out PHP functions for case-fixing names, that take into consideration the various exceptions that ucwords() would totally butcher, such as:

  • Sven-Alex Crumpet
  • Ronaldo McDonaldo
  • Boopsie O'Brien
  • J.R. Bob Dobbs
  • Francesca de los Gatos
  • YungCheng Li

Any functions out there that attempt to accommodate these alphabet rebels?

UPDATE
From Robin v. G.'s point of van-tage, there can be no script to rule them all. But I've decided that names entered entirely in lower or uppercase are likely candidates for a good scrubbing. So for these, I will do ...

    if ($name == strtoupper($name) || $name == strtolower($name)) {
        $name = ucwords(strtolower($name));
    }

It would be easy enough to modify this to fix a few likely exceptions: dashes, apostrophes, 'McD', etc. Mistakes will be made, but who will complain? Not the meek bastard who entered their name in lowercase.

Oh wait, my name is in lowercase...

like image 801
neokio Avatar asked Oct 17 '12 06:10

neokio


2 Answers

This is simply impossible.

Spelling of names varies from country to country, as you show in your question. The easiest way to go is to find the most common way of spelling, and that would be to capitalise every first letter of every 'word', i.e. every string preceded by a space, hyphen, dot or apostroph.

This doesn't fix all your problems (YungCheng, McDonaldo) and leaves you with other issues as well, but that's as close as you're gonna get.

Compare:

  • Alex Van Halen (US spelling)
  • Alex van Halen (correct Dutch spelling)

There's no algorithm fixing this.

This article illustrates the problem with Dutch names very well, and that's just one language. There's probably an article like this for every language in the world. ;)

like image 156
Sherlock Avatar answered Oct 13 '22 18:10

Sherlock


Here is a try

$names=array();
$names[]="sven-alex crumpet";
$names[]="RONALDO McDonalDO";
$names[]="Boopsie o'Brien";
$names[]="j.r. BOB DOBBS";
$names[]="francesca DE LOS gatOS";
$names[]="yungcheng LI";
$names[]="mr hankey";
$names[]="santas little helper";
$names[]="j.r.r. tolkien";

$splitters=array(' ','.',"'",'-'); //more to come
$fixedNames=array();

foreach($names as $name) {
    $fixed='';
    $blank=str_replace($splitters,'?',$name);
    $n=explode('?',$blank);
    foreach($n as $f) $fixed.=ucfirst(strtolower($f)).' ';
    for ($i=0;$i<strlen($fixed);$i++) {
        if ($fixed[$i]==' ') {
            if ($blank[$i]=='?') {
                $fixed[$i]=$name[$i];
            }
        }
    }
    $fixedNames[]=substr_replace($fixed,'', -1);
}

echo '<pre>';
print_r($fixedNames);
echo '<pre>';

outputs

Array
(
    [0] => Sven-Alex Crumpet
    [1] => Ronaldo Mcdonaldo
    [2] => Boopsie O'Brien
    [3] => J.R. Bob Dobbs
    [4] => Francesca De Los Gatos
    [5] => Yungcheng Li
    [6] => Mr Hankey
    [7] => Santas Little Helper
    [8] => J.R.R. Tolkien
)

It is impossible to "correct" a name like YungCheng without algorithms taking care of regional / cultural conventions and a huge name database to compare with.

like image 22
davidkonrad Avatar answered Oct 13 '22 19:10

davidkonrad