<p>I am looking for a regex to replace 'NO-BREAK SPACE's from a string.</p> <p>There are some question on SO related to 'NO-BREAK SPACE', but none seems to point me to the right answer.</p> <p>So far, i tried to use (second character of the String "A B" is a no break space) without success:</p> <pre class="prettyprint"><code>"A B".replace(new RegExp(String.fromCharCode(160),"g"),"xxx"); "A B".replace($('<b>&nbsp;</b>').text(), 'xxx'); "A B".replace(/\xA0/,'xxx'); "A B".replace(/\\xA0/,'xxx'); "A B".replace(/\u00A0/,'xxx'); "A B".replace(/\\u00A0/,'xxx'); </code></pre> <p><strong>UPDATE:</strong> Stupid me. Truth is i tested with the wrong character for quite some time.</p>

<p>Apart from space, NO-BREAK SPACE, etc. there are also other spaces characters that can also appear in strings.</p> <p>Here is the complete Unicode list for spaces. Source: http://jkorpela.fi/chars/spaces.html</p> <div class="s-table-container"> <table class="s-table"> <thead><tr> <th>Number</th> <th>Character name</th> </tr></thead> <tbody> <tr> <td>\u0020</td> <td>space</td> </tr> <tr> <td>\u00A0</td> <td>no-break space</td> </tr> <tr> <td>\u1680</td> <td>Ogham space mark</td> </tr> <tr> <td>\u180E</td> <td>Mongolian vowel separator</td> </tr> <tr> <td>\u2000</td> <td>en quad</td> </tr> <tr> <td>\u2001</td> <td>em quad</td> </tr> <tr> <td>\u2002</td> <td>en space (nut)</td> </tr> <tr> <td>\u2003</td> <td>em space (mutton)</td> </tr> <tr> <td>\u2004</td> <td>three-per-em space (thick space)</td> </tr> <tr> <td>\u2005</td> <td>four-per-em space (mid space)</td> </tr> <tr> <td>\u2006</td> <td>six-per-em space</td> </tr> <tr> <td>\u2007</td> <td>figure space</td> </tr> <tr> <td>\u2008</td> <td>punctuation space</td> </tr> <tr> <td>\u2009</td> <td>thin space</td> </tr> <tr> <td>\u200A</td> <td>hair space</td> </tr> <tr> <td>\u200B</td> <td>zero width space</td> </tr> <tr> <td>\u202F</td> <td>narrow no-break space</td> </tr> <tr> <td>\u205F</td> <td>medium mathematical space</td> </tr> <tr> <td>\u3000</td> <td>ideographic space</td> </tr> <tr> <td>\uFEFF</td> <td>zero width no-break space</td> </tr> </tbody> </table> </div> <p>Therefore, to replace all strange spaces</p> <pre class="prettyprint"><code>.replace(/[\u00A0\u1680\u180E\u2000-\u200B\u202F\u205F\u3000\uFEFF]/, " ") </code></pre> <p>From the above, you may exclude <code>\u1680</code>, since it's "usually not really a space but a dash".</p>

Regex to replace 'NO-BREAK SPACE'

Tags:

javascript

string

regex

I am looking for a regex to replace 'NO-BREAK SPACE's from a string.

There are some question on SO related to 'NO-BREAK SPACE', but none seems to point me to the right answer.

So far, i tried to use (second character of the String "A B" is a no break space) without success:

Click to copy

"A B".replace(new RegExp(String.fromCharCode(160),"g"),"xxx");
"A B".replace($('<b>&nbsp;</b>').text(), 'xxx');
"A B".replace(/\xA0/,'xxx');
"A B".replace(/\\xA0/,'xxx');
"A B".replace(/\u00A0/,'xxx');
"A B".replace(/\\u00A0/,'xxx');

UPDATE: Stupid me. Truth is i tested with the wrong character for quite some time.

322

asked Aug 03 '15 14:08

Thariama

2 Answers

Apart from space, NO-BREAK SPACE, etc. there are also other spaces characters that can also appear in strings.

Here is the complete Unicode list for spaces. Source: http://jkorpela.fi/chars/spaces.html

Number	Character name
\u0020	space
\u00A0	no-break space
\u1680	Ogham space mark
\u180E	Mongolian vowel separator
\u2000	en quad
\u2001	em quad
\u2002	en space (nut)
\u2003	em space (mutton)
\u2004	three-per-em space (thick space)
\u2005	four-per-em space (mid space)
\u2006	six-per-em space
\u2007	figure space
\u2008	punctuation space
\u2009	thin space
\u200A	hair space
\u200B	zero width space
\u202F	narrow no-break space
\u205F	medium mathematical space
\u3000	ideographic space
\uFEFF	zero width no-break space

Therefore, to replace all strange spaces

Click to copy

.replace(/[\u00A0\u1680\u180E\u2000-\u200B\u202F\u205F\u3000\uFEFF]/, " ")

From the above, you may exclude \u1680, since it's "usually not really a space but a dash".

answered Sep 22 '22 06:09

Rakesh Chaudhari

Apparently there is no unicode category that cover this use-case.

The regex in @Rakesh's answer was missing some characters from the list of unicode-space and I needed c#-flavor.

Here the list is converted to a c#-expression that produces regex-pattern:

Click to copy

string.Concat("{", string.Join("|", new[]
{
    new { c = '\u0020', desc = "space" },
    new { c = '\u00A0', desc = "no-break space" },
    new { c = '\u1680', desc = "Ogham space mark" },
    new { c = '\u180E', desc = "Mongolian vowel separator" },
    new { c = '\u2000', desc = "en quad" },
    new { c = '\u2001', desc = "em quad" },
    new { c = '\u2002', desc = "en space (nut)" },
    new { c = '\u2003', desc = "em space (mutton)" },
    new { c = '\u2004', desc = "three-per-em space (thick space)" },
    new { c = '\u2005', desc = "four-per-em space (mid space)" },
    new { c = '\u2006', desc = "six-per-em space" },
    new { c = '\u2007', desc = "figure space" },
    new { c = '\u2008', desc = "punctuation space" },
    new { c = '\u2009', desc = "thin space" },
    new { c = '\u200A', desc = "hair space" },
    new { c = '\u200B', desc = "zero width space" },
    new { c = '\u202F', desc = "narrow no-break space" },
    new { c = '\u205F', desc = "medium mathematical space" },
    new { c = '\u3000', desc = "ideographic space" },
    new { c = '\uFEFF', desc = "zero width no-break space" },
}
.Select(a => $"\\u{(int)a.c:X4}")
), "}")

// Become "{\u0020|\u00A0|\u1680|\u180E|\u2000|\u2001|\u2002|\u2003|\u2004|\u2005|\u2006|\u2007|\u2008|\u2009|\u200A|\u200B|\u202F|\u205F|\u3000|\uFEFF}"

_{For copy-paste and view in LINQPad:
.Select(a => new { a.c, num = (int)a.c, part = $"\\u{(int)a.c:X4}", a.desc })}

answered Sep 22 '22 06:09

Grastveit

Related questions
                            
                                Filter search for <ul>
                            
                                SyntaxError: JSON.parse: expected property name or '}' while using highcharts
                            
                                How to click a "select option" and then evaluate loaded content with casperjs
                            
                                HighCharts - two Y-axis, one with max value
                            
                                prevent full page scrolling iOS
                            
                                Using $scope functions from a different controller in AngularJS [duplicate]
                            
                                Ternary operator displays error in JSHint - Expected an assignment or function call and instead saw an expression
                            
                                How to remove JS comments using PHP?
                            
                                Removing an argument from arguments in JavaScript
                            
                                Using html templates in angular's ng-switch
                            
                                Best practie to require a module within Sails.js globally?
                            
                                Add element to array collections.update in meteor
                            
                                JavaScript if(x) vs if(x==true)
                            
                                Fire an event on play of youtube iframe embed
                            
                                Remove all features from data layer
                            
                                Replace double backslashes with a single backslash in javascript
                            
                                Making my background images load faster
                            
                                When, where and how to add class to the document.body when using React.js
                            
                                Stripe: How to set up recurring payments without plan?
                            
                                How to flip a Three.js texture horizontally

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex to replace 'NO-BREAK SPACE'

Tags:

javascript

string

regex

Thariama

People also ask

2 Answers

Rakesh Chaudhari

Grastveit

Recent Activity

Donate For Us