Replace special characters by equivalent

How do I replace the following special characters by their equivalent?

Vowels: ÁÉÍÓÚáéíóú by AEIOUaeiou respectively. And letter Ñ by N.

The expression:

str = regexprep(str,'[^a-zA-Z]','');

Will remove all characters non in alphabet, but how do I replace with something equivalent like shown above?


Jorge Zapata

Jorge Zapata

2 Answers

You could write a series of regular expressions like:

s = regexprep(s,'(?:À|Á|Â|Ã|Ä|Å)','A')
s = regexprep(s,'(?:Ì|Í|Î|Ï)','I')

and so on for the rest of the accented characters... (for both upper/lower cases)

Warning: there are so many variations even for the small subset of Latin alphabet

A simpler example:

chars_old = 'ÁÉÍÓÚáéíóú';
chars_new = 'AEIOUaeiou';

str = 'Ámró';
[tf,loc] = ismember(str, chars_old);
str(tf) = chars_new( loc(tf) )

The string before:







Amro


The following code normalizes all diacritic characters ie ÅÄÖ.

function inputWash {
    [string]$formD = $inputString.Normalize(
    $stringBuilder = new-object System.Text.StringBuilder
    for ($i = 0; $i -lt $formD.Length; $i++){
        $unicodeCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($formD[$i])
        $nonSPacingMark = [System.Globalization.UnicodeCategory]::NonSpacingMark
        if($unicodeCategory -ne $nonSPacingMark){
            $stringBuilder.Append($formD[$i]) | out-null
    $string = $stringBuilder.ToString().Normalize([System.text.NormalizationForm]::FormC)
    return $string.toLower()
Write-Host inputWash("ÖÄÅÑÜ");


Ommit .toLower() if you don't want that feature

Otto Remse

Otto Remse