I am trying to build a regex function that will remove any non alpha numeric characters and remove all duplicate characters e.g. this : aabcd*def%gGGhhhijkklmnoP\1223 would become this : abcddefgGhijklmnoPR3. I am able to remove the special characters easily but can't for the life of me work out how to remove the duplicate characters ? This is my current code for removing the special characters :
var oldString = aabcd*def%gGGhhhijkklmnoP\122
var filtered = oldStringt.replace(/[^\w\s]/gi, "");
How can I extend the above regex to check for duplicate characters and those duplicate characters separated by non-alphanumeric characters.
The regex is /[^\w\s]|(.)\1/gi
Test here: http://jsfiddle.net/Cte94/
it uses the backreference to search for any character (.)
followed by the same character \1
Unless by "check for duplicate characters" you meant that aaa => a
Then it's /[^\w\s]|(.)(?=\1)/gi
Test here: http://jsfiddle.net/Cte94/1/
Be aware that both regexes don't distinguish between case. A == a
, so Aa
is a repetition. If you don't want it, take away the i
from /gi
\1+ is the key
"aabcdd".replace(/(\w)\1+/g, function (str, match) {
return match[0]
}); // abcd
Non regex version:
var oldString = "aabcd*def%gGGhhhijkklmnoP\122";
var newString = "";
var len = oldString.length;
var c = oldString[0];
for ( var i = 1; i < len; ++i ) {
if ( c != oldString[i] ) {
newString += c;
}
c = oldString[i];
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With