I have texts that look like this:
the quick brown fox 狐狸 m i c r o s o f t マ イ ク ロ ソ フ ト jumps over the lazy dog 跳過懶狗 best wishes : John Doe
What's a good regex (for python) that can remove the single-characters so that the output looks like this:
the quick brown fox 狐狸 jumps over the lazy dog 跳過懶狗 best wishes John Doe
I've tried some combinations of \s{1}\S{1}\s{1}\S{1}
, but they inevitably end up removing more letters than I need.
You can replace the following with empty string:
(?<!\S)\S(?!\S).?
Match a non-space that has no non-spaces on either side of it (i.e. surrounded by spaces), plus the character after that (if any).
The reason why I used negative lookarounds is because it neatly handles the start/end of string case. We match the extra character that follows the \S
to remove the space as well.
Regex101 Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With