I am looking for a regex to replace 'NO-BREAK SPACE's from a string.
There are some question on SO related to 'NO-BREAK SPACE', but none seems to point me to the right answer.
So far, i tried to use (second character of the String "A B" is a no break space) without success:
"A B".replace(new RegExp(String.fromCharCode(160),"g"),"xxx");
"A B".replace($('<b> </b>').text(), 'xxx');
"A B".replace(/\xA0/,'xxx');
"A B".replace(/\\xA0/,'xxx');
"A B".replace(/\u00A0/,'xxx');
"A B".replace(/\\u00A0/,'xxx');
UPDATE: Stupid me. Truth is i tested with the wrong character for quite some time.
If you prefer to just search for non-breaking spaces, you can, in step 2, type Ctrl+Shift+Spacebar, which inserts a non-breaking space character (^s) in the Find What box.
The backslash in a regular expression precedes a literal character. You also escape certain letters that represent common character classes, such as \w for a word character or \s for a space.
The u flag enables various Unicode-related features. With the "u" flag: Any Unicode code point escapes ( \u{xxxx} , \p{UnicodePropertyValue} ) will be interpreted as such instead of as literal characters. Surrogate pairs will be interpreted as whole characters instead of two separate characters.
\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.
Apart from space, NO-BREAK SPACE, etc. there are also other spaces characters that can also appear in strings.
Here is the complete Unicode list for spaces. Source: http://jkorpela.fi/chars/spaces.html
Number | Character name |
---|---|
\u0020 | space |
\u00A0 | no-break space |
\u1680 | Ogham space mark |
\u180E | Mongolian vowel separator |
\u2000 | en quad |
\u2001 | em quad |
\u2002 | en space (nut) |
\u2003 | em space (mutton) |
\u2004 | three-per-em space (thick space) |
\u2005 | four-per-em space (mid space) |
\u2006 | six-per-em space |
\u2007 | figure space |
\u2008 | punctuation space |
\u2009 | thin space |
\u200A | hair space |
\u200B | zero width space |
\u202F | narrow no-break space |
\u205F | medium mathematical space |
\u3000 | ideographic space |
\uFEFF | zero width no-break space |
Therefore, to replace all strange spaces
.replace(/[\u00A0\u1680\u180E\u2000-\u200B\u202F\u205F\u3000\uFEFF]/, " ")
From the above, you may exclude \u1680
, since it's "usually not really a space but a dash".
Apparently there is no unicode category that cover this use-case.
The regex in @Rakesh's answer was missing some characters from the list of unicode-space and I needed c#-flavor.
Here the list is converted to a c#-expression that produces regex-pattern:
string.Concat("{", string.Join("|", new[]
{
new { c = '\u0020', desc = "space" },
new { c = '\u00A0', desc = "no-break space" },
new { c = '\u1680', desc = "Ogham space mark" },
new { c = '\u180E', desc = "Mongolian vowel separator" },
new { c = '\u2000', desc = "en quad" },
new { c = '\u2001', desc = "em quad" },
new { c = '\u2002', desc = "en space (nut)" },
new { c = '\u2003', desc = "em space (mutton)" },
new { c = '\u2004', desc = "three-per-em space (thick space)" },
new { c = '\u2005', desc = "four-per-em space (mid space)" },
new { c = '\u2006', desc = "six-per-em space" },
new { c = '\u2007', desc = "figure space" },
new { c = '\u2008', desc = "punctuation space" },
new { c = '\u2009', desc = "thin space" },
new { c = '\u200A', desc = "hair space" },
new { c = '\u200B', desc = "zero width space" },
new { c = '\u202F', desc = "narrow no-break space" },
new { c = '\u205F', desc = "medium mathematical space" },
new { c = '\u3000', desc = "ideographic space" },
new { c = '\uFEFF', desc = "zero width no-break space" },
}
.Select(a => $"\\u{(int)a.c:X4}")
), "}")
// Become "{\u0020|\u00A0|\u1680|\u180E|\u2000|\u2001|\u2002|\u2003|\u2004|\u2005|\u2006|\u2007|\u2008|\u2009|\u200A|\u200B|\u202F|\u205F|\u3000|\uFEFF}"
For copy-paste and view in LINQPad:.Select(a => new { a.c, num = (int)a.c, part = $"\\u{(int)a.c:X4}", a.desc })
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With