Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove space delimited single characters

Tags:

python

regex

I have texts that look like this:

the quick brown fox 狐狸 m i c r o s o f t マ イ ク ロ ソ フ ト jumps over the lazy dog 跳過懶狗 best wishes : John Doe

What's a good regex (for python) that can remove the single-characters so that the output looks like this:

the quick brown fox 狐狸 jumps over the lazy dog 跳過懶狗 best wishes John Doe

I've tried some combinations of \s{1}\S{1}\s{1}\S{1}, but they inevitably end up removing more letters than I need.

like image 841
Eric L Avatar asked Dec 23 '22 15:12

Eric L


1 Answers

You can replace the following with empty string:

(?<!\S)\S(?!\S).?

Match a non-space that has no non-spaces on either side of it (i.e. surrounded by spaces), plus the character after that (if any).

The reason why I used negative lookarounds is because it neatly handles the start/end of string case. We match the extra character that follows the \S to remove the space as well.

Regex101 Demo

like image 146
Sweeper Avatar answered Jan 01 '23 14:01

Sweeper