Javascript regular expression to leave only words (international version)

Question

I'm trying to strip a string to leave only word characters remaining. For anything using the Latin alphabet, I can manage it quite easily with

str = str.replace(/\W/g, '').replace(/[0-9]/g, '');

(I think I probably don't need both replaces, but I'm very new to regular expressions and not sure what I'm doing)

However, this also strips out foreign characters such as chinese or arabic.

How would I write a function to do this?

strOne = "test!(£)98* string";
strTwo = "你好，325!# 世界";

cleanUp (strOne); // Output: "test string"
cleanUp (strTwo); // Output: "您好 世界"

(In case anyone is wondering, the chinese is me running "hello world" through an online translator)

On a library note, I don't know if it's relevant but I'm using dojo and would like to avoid jquery if possible.

collapsar · Accepted Answer

you need a regex pattern using unicode character properties, namely \P{Letter}.

unfortunately the native js regex engine does not support these constructs (cf. mdn docs). however there is (at least) this third-party library which includes a js plugin adding the support.

code sample:

var regex, str;

str = "whatever";

regex = XRegExp('\P{Letter}'); 
str   = XRegExp.replace(str, regex, '');

Javascript regular expression to leave only words (international version)

Tags:

javascript

regex

Emma

1 Answers

collapsar

Recent Activity

Donate For Us

Javascript regular expression to leave only words (international version)

Tags:

javascript

regex

Emma

1 Answers

collapsar

Related questions

Recent Activity

Donate For Us