Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing umlauts in JS

I am comparing strings and have to replace umlauts in JS, but it seems JS does not recognize the umlauts in the strings. The text comes from the database and in the browser the umlauts do show fine.

function replaceUmlauts(string)
{
    value = string.toLowerCase();
    value = value.replace(/ä/g, 'ae');
    value = value.replace(/ö/g, 'oe');
    value = value.replace(/ü/g, 'ue');
    return value;
}

As search patterns I tried:

  • "ä", "ö", "ü"
  • /ä/, /ö/, /ü/
  • "ä", "ö", "ü" (well total despair ;-))

To be sure, that it is not a matter with the replace function I tried indexOf:

console.log(value.indexOf('ä'));

But the output with all patterns is: -1

So I guess it is some kind of a problem with encoding, but as I said on the page the umlauts do just look fine.

Any ideas? This seems so simple...

EDIT: Even if I found my answer, the problem was not really solved "at the root" (the encoding). This is my page encoding:

<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

The database has: utf8_general_ci

Seems totally alright to me.

like image 283
SamiSalami Avatar asked Jul 25 '12 15:07

SamiSalami


People also ask

What is replace (/ g in JavaScript?

The "g" that you are talking about at the end of your regular expression is called a "modifier". The "g" represents the "global modifier". This means that your replace will replace all copies of the matched string with the replacement string you provide.


3 Answers

Either ensure that your script's encoding is correctly specified (in <script> tag or in page's header/meta if it's embedded) or specify symbols with \uNNNN syntax that will always unambiguously resolve to some specific Unicode codepoint.

For example:

str.replace(/\u00e4/g, "ae") 

Will always replace ä with ae, no matter what encoding is set for your page/script, even if it is incorrect.

Here are the codes needed for Germanic languages:

// Ü, ü     \u00dc, \u00fc // Ä, ä     \u00c4, \u00e4 // Ö, ö     \u00d6, \u00f6 // ß        \u00df 
like image 180
Oleg V. Volkov Avatar answered Sep 19 '22 18:09

Oleg V. Volkov


If you are looking to replace the German Umlaute with cleverly respecting the case, use this (opensource, happy to share, all by me):

const umlautMap = {
  '\u00dc': 'UE',
  '\u00c4': 'AE',
  '\u00d6': 'OE',
  '\u00fc': 'ue',
  '\u00e4': 'ae',
  '\u00f6': 'oe',
  '\u00df': 'ss',
}

function replaceUmlaute(str) {
  return str
    .replace(/[\u00dc|\u00c4|\u00d6][a-z]/g, (a) => {
      const big = umlautMap[a.slice(0, 1)];
      return big.charAt(0) + big.charAt(1).toLowerCase() + a.slice(1);
    })
    .replace(new RegExp('['+Object.keys(umlautMap).join('|')+']',"g"),
      (a) => umlautMap[a]
    );
}

const test = ['Übung', 'ÜBUNG', 'üben', 'einüben', 'EINÜBEN', 'Öde ätzende scheiß Übung']
test.forEach((str) => console.log(str + " -> " + replaceUmlaute(str)))

It will:

  • Übung -> Uebung
  • ÜBUNG -> UEBUNG
  • üben -> ueben
  • einüben -> einueben
  • EINÜBEN -> EINUEBEN
  • and the same for Ä, Ö
  • and simple ß -> ss
like image 29
Andreas Richter Avatar answered Sep 22 '22 18:09

Andreas Richter


Here's a function that replaces most common chars to produce a Google friendly SEO url:

function deUmlaut(value){
  value = value.toLowerCase();
  value = value.replace(/ä/g, 'ae');
  value = value.replace(/ö/g, 'oe');
  value = value.replace(/ü/g, 'ue');
  value = value.replace(/ß/g, 'ss');
  value = value.replace(/ /g, '-');
  value = value.replace(/\./g, '');
  value = value.replace(/,/g, '');
  value = value.replace(/\(/g, '');
  value = value.replace(/\)/g, '');
  return value;
}
like image 32
Fidel Gonzo Avatar answered Sep 21 '22 18:09

Fidel Gonzo