Remove all spaces between Chinese words with regex

Tags:

javascript

regex

I would like to remove all spaces among Chinese text only.

My text: "請把這裡的 10 多個字合併. Can you help me?"

Ideal output: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?'; str = str.replace("/\&nbsp;/", "");

I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.

583

asked Jan 14 '19 09:01

lewishole

1 Answers

Getting to the Chinese char matching pattern

Using the Unicode Tools, the \p{Han} Unicode property class that matches any Chinese char can be translated into

[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9\U00020000-\U0002A6D6\U0002A700-\U0002B734\U0002B740-\U0002B81D\U0002B820-\U0002CEA1\U0002CEB0-\U0002EBE0\U0002F800-\U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9\u{20000}-\u{2A6D6}\u{2A700}-\u{2B734}\u{2B740}-\u{2B81D}\u{2B820}-\u{2CEA1}\u{2CEB0}-\u{2EBE0}\u{2F800}-\u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D])\s+(?=(?:[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(\p{Script=Hani})\s+(?=\p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char
\s+ - any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";  var HanChr = "[\\u2E80-\\u2E99\\u2E9B-\\u2EF3\\u2F00-\\u2FD5\\u3005\\u3007\\u3021-\\u3029\\u3038-\\u303B\\u3400-\\u4DB5\\u4E00-\\u9FEF\\uF900-\\uFA6D\\uFA70-\\uFAD9]|[\\uD840-\\uD868\\uD86A-\\uD86C\\uD86F-\\uD872\\uD874-\\uD879][\\uDC00-\\uDFFF]|\\uD869[\\uDC00-\\uDED6\\uDF00-\\uDFFF]|\\uD86D[\\uDC00-\\uDF34\\uDF40-\\uDFFF]|\\uD86E[\\uDC00-\\uDC1D\\uDC20-\\uDFFF]|\\uD873[\\uDC00-\\uDEA1\\uDEB0-\\uDFFF]|\\uD87A[\\uDC00-\\uDFE0]|\\uD87E[\\uDC00-\\uDE1D]";   console.log(s.replace(new RegExp('(' + HanChr + ')\\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

A test for the regex compliant with the ECMAScript 2018 standard:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";  console.log(s.replace(/(\p{Script=Hani})\s+(?=\p{Script=Hani})/gu, '$1'));

196

answered Oct 06 '22 09:10

Wiktor Stribiżew

Related questions
                            
                                Set Session variable using javascript in PHP
                            
                                Is there a reason JavaScript developers don't use Array.push()?
                            
                                jasmine: spyOn(obj, 'method').andCallFake or and.callFake?
                            
                                How npmjs.com calculates the code quality
                            
                                What are some JavaScript unit testing and mocking frameworks you have used? [closed]
                            
                                Why can't I display a pound (£) symbol in HTML?
                            
                                Is this an example of variable shadowing in JavaScript?
                            
                                Difference between response.setHeader and response.writeHead?
                            
                                ChangeDetectionStrategy.OnPush and Observable.subscribe in Angular 2
                            
                                Conflicting boolean values of an empty JavaScript array
                            
                                How does the new operator work in JavaScript?
                            
                                SYNTAX_ERR: DOM Exception 12 - Hmmm
                            
                                How to use underscore's "intersection" on objects?
                            
                                Iframe without src but still has content?
                            
                                Are ES6 module imports hoisted?
                            
                                Localize Strings in Javascript
                            
                                HTML5 Notification not working in Mobile Chrome
                            
                                Using namespace spread over multiple module files in TypeScript
                            
                                D3 Tree Layout Separation Between Nodes using NodeSize
                            
                                window.history.pushState refreshing the browser

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With