Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - How to remove spaces between Chinese characters while remaining the spaces in between a character and a number?

the real issue may be more complicated, but for now, I'm trying do accomplish something a bit easier. I'm trying to remove space in between 2 Chinese/Japanese characters, but at the same time maintaining the space between a number and a character. An example below:

text = "今天特别 热,但是我买了 3 个西瓜。"

The output I want to get is

text = "今天特别热,但是我买了 3 个西瓜。"

I tried to use Python script and regular expression:

import re
text = re.sub(r'\s(?=[^A-z0-9])','')

However, the result is

text = '今天特别热,但是我买了 3个西瓜。'

So I'm struggling about how I can maintain the space between a character and a number at all time? And I don't want to use a method of adding a space between "3" and "个".

I'll continue to think about it, but let me know if you have ideas...Thank you so much in advance!

like image 496
Penny Avatar asked Nov 21 '25 05:11

Penny


1 Answers

I understand the spaces you need to remove reside in between letters.

Use

re.sub(r'(?<=[^\W\d_])\s+(?=[^\W\d_])', '', text)

Details:

  • (?<=[^\W\d_]) - a positive lookbehind requiring a Unicode letter immediately to the left of the current location
  • \s+ - 1+ whitespaces (remove + if only one is expected)
  • (?=[^\W\d_]) - a positive lookahead that requires a Unicode letter immediately to the right of the current location.

You do not need re.U flag since it is on by default in Python 3. You need it in Python 2 though.

You may also use capturing groups:

re.sub(r'([^\W\d_])\s+([^\W\d_])', r'\1\2', text)

where the non-consuming lookarounds are turned into consuming capturing groups ((...)). The \1 and \2 in the replacement pattern are backreferences to the capturing group values.

See a Python 3 online demo:

import re
text = "今天特别 热,但是我买了 3 个西瓜。"
print(re.sub(r'(?<=[^\W\d_])\s+(?=[^\W\d_])', '', text))
// => 今天特别热,但是我买了 3 个西瓜。
like image 56
Wiktor Stribiżew Avatar answered Nov 22 '25 19:11

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!