I often work with text files which have a variable amount of whitespaces as word separators (text processors like Word do this, to distribute fairly the whitespace amount due to different sizes of letters in certain fonts and they put this annoying variable amount of spaces even when saving as plain text).
I would like to automate the process of replacing these sequences of whitespaces that have variable length with single spaces. I suspect a regex could do it, but there are also whitespaces at the beginning of paragraphs (usually four of them, but not always), which I would want to let unchanged, so basically my regex should also not touch the leading whitespaces and this adds to the complexity.
I'm using vim, so a regex in the vim regex dialect would be very useful to me, if this is doable.
My current progress looks like this:
:%s/ \+/ /g
but it doesn't work correctly.
I'm also considering to write a vim script that could parse text lines one by one, process each line char by char and skip the whitespaces after the first one, but I have a feeling this would be overkill.
The basic construct of the command is s#search#replace#. Sometimes you see it as s///. The % before the s tells the regex to work on all lines in the vim buffer, not just the current. The space followed by \+ matches one or more spaces.
The metacharacter “\s” matches spaces and + indicates the occurrence of the spaces one or more times, therefore, the regular expression \S+ matches all the space characters (single or multiple). Therefore, to replace multiple spaces with a single space.
Simple SED commands are: sed s/ */ /g This will replace any number of spaces with a single space. sed s/ $// This will replace any single space at the end of the line with nothing. sed s/ /,/g This will replace any single space with a single comma.
as an "any character" wildcard. If I read that correctly, then your replacement is a space. You can use :%s/\.//g if you want to delete the . characters instead of replacing them with spaces.
this will replace 2 or more spaces
s/ \{2,}/ /g
or you could add an extra space before the \+
to your version
s/ \+/ /g
This will do the trick:
%s![^ ]\zs \+! !g
Many substitutions can be done in Vim easier than with other regex dialects by using the \zs
and \ze
meta-sequences. What they do is to exclude part of the match from the final result, either the part before the sequence (\zs
, “s” for “start here”) or the part after (\ze
, “e” for “end here”). In this case, the pattern must match one non-space character first ([^ ]
) but the following \zs
says that the final match result (which is what will be replaced) starts after that character.
Since there is no way to have a non-space character in front of line-leading whitespace, it will be not be matched by the pattern, so the substitution will not replace it. Simple.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With