Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove umlauts or specialchars in javascript string

Never played before with umlauts or specialchars in javascript strings. My problem is how to remove them?

For example I have this in javascript:

var oldstr = "Bayern München";
var str = oldstr.split(' ').join('-');

Result is Bayern-München ok easy, but now I want to remove the umlaut or specialchar like:

Real Sporting de Gijón.

How can I realize this?

Kind regards,

Frank

like image 889
Frank Avatar asked Sep 19 '25 14:09

Frank


1 Answers

replace should be able to do it for you, e.g.:

var str = str.replace(/ü/g, 'u');

...of course ü and u are not the same letter. :-)

If you're trying to replace all characters outside a given range with something (like a -), you can do that by specifying a range:

var str = str.replace(/[^A-Za-z0-9\-_]/g, '-');

That replaces all characters that aren't English letters, digits, -, or _ with -. (The character range is the [...] bit, the ^ at the beginning means "not".) Here's a live example.

But that ("Bayern-M-nchen") may be a bit unpleasant for Mr. München to look at. :-) You could use a function passed into replace to try to just drop diacriticals:

var str = str.replace(/[^A-Za-z0-9\-_]/g, function(ch) {
  // Character that look a bit like 'a'
  if ("áàâä".indexOf(ch) >= 0) { // There are a lot more than this
    return 'a';
  }
  // Character that look a bit like 'u'
  if ("úùûü".indexOf(ch) >= 0) { // There are a lot more than this
    return 'u';
  }
  /* ...long list of others...*/
  // Default
  return '-';
});

Live example

The above is optimized for long strings. If the string itself is short, you may be better off with repeated regexps:

var str = str.replace(/[áàâä]/g, 'a')
             .replace(/[úùûü]/g, 'u')
             .replace(/[^A-Za-z0-9\-_]/g, '-');

...but that's speculative.

Note that literal characters in JavaScript strings are totally fine, but you can run into fun with encoding of files. I tend to stick to unicode escapes. So for instance, the above would be:

var str = str.replace(/[\u00e4\u00e2\u00e0\u00e1]/g, 'a')
             .replace(/[\u00fc\u00fb\u00f9\u00fa]/g, 'u')
             .replace(' ','-');

...but again, there are a lot more to do...

like image 145
T.J. Crowder Avatar answered Sep 21 '25 04:09

T.J. Crowder