I am trying to covert input file content of this: <pre class="prettyprint"><code>NP_418770.2: 257-296 344-415 503-543 556-592 642-707 YP_026226.4: 741-779 811-890 896-979 1043-1077 </code></pre> to this: <pre class="prettyprint"><code>NP_418770.2: 257-296, 344-415, 503-543, 556-592, 642-707 YP_026226.4: 741-779, 811-890, 896-979, 1043-1077 </code></pre> i.e., replace a space with comma and space (excluding newline) For that, I have tried: <pre class="prettyprint"><code>perl -pi.bak -e "s/[^\S\n]+/, /g" input.txt </code></pre> but it gives: <pre class="prettyprint"><code>NP_418770.2:, 257-296, 344-415, 503-543, 556-592, 642-707 YP_026226.4:, 741-779, 811-890, 896-979, 1043-1077 </code></pre> how can I stop the additional comma which appear after ":" (I want ":" and a single space) without writing another regex? Thanks

Try using regex negative lookbehind. It is basically look if the character before the space is colon (<code>:</code>) then it don't match that space. <pre class="prettyprint"><code>s/(?<!:)[^\S\n]+/, /g </code></pre>

You can play with the word-boundary to discard the space that follows the colon: <code>s/\b\h+/, /g</code> It can be done with perl: <pre class="prettyprint"><code>perl -pe's/\b\h+/, /g' file </code></pre> but also with sed: <pre class="prettyprint"><code>sed -E 's/\b[ \t]+/, /g' file </code></pre> <hr> Other approach that uses the field separator: <pre class="prettyprint"><code>perl -F'\b\h+' -ape'BEGIN{$,=", "}' file </code></pre> or do the same with awk: <pre class="prettyprint"><code>awk -F'\b[ \t]+' -vOFS=', ' '1' file </code></pre>

Regex for replacing space with comma-space, except at end of line

Tags:

regex

perl

I am trying to covert input file content of this:

NP_418770.2: 257-296 344-415 503-543 556-592 642-707
YP_026226.4: 741-779 811-890 896-979 1043-1077

to this:

NP_418770.2: 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4: 741-779, 811-890, 896-979, 1043-1077

i.e., replace a space with comma and space (excluding newline)

For that, I have tried:

perl -pi.bak -e "s/[^\S\n]+/, /g" input.txt

but it gives:

NP_418770.2:, 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4:, 741-779, 811-890, 896-979, 1043-1077

how can I stop the additional comma which appear after ":" (I want ":" and a single space) without writing another regex?

Thanks

817

asked Nov 02 '16 06:11

J.Carter

2 Answers

Try using regex negative lookbehind. It is basically look if the character before the space is colon (:) then it don't match that space.

s/(?<!:)[^\S\n]+/, /g

157

answered Nov 03 '22 05:11

Niyoko

You can play with the word-boundary to discard the space that follows the colon: s/\b\h+/, /g

It can be done with perl:

perl -pe's/\b\h+/, /g' file

but also with sed:

sed -E 's/\b[ \t]+/, /g' file

Other approach that uses the field separator:

perl -F'\b\h+' -ape'BEGIN{$,=", "}' file

or do the same with awk:

awk -F'\b[ \t]+' -vOFS=', ' '1' file

answered Nov 03 '22 06:11

Casimir et Hippolyte

Related questions
                            
                                Find files with matching patterns in a directory c#?
                            
                                NSPredicate with a string matching beginning of words
                            
                                What's the meaning of `?:` in regular expression
                            
                                Combining multiple regex substitutions
                            
                                Python regex - Ignore parenthesis as indexing?
                            
                                Search for div classes with regex
                            
                                Shell Scripting: RegEx in if Statement
                            
                                How to remove trailing comments via regexp?
                            
                                how to iterate all regex matches in a std::string with their starting positions in c++11 std::regex?
                            
                                Removing special characters using Ruby, but not spaces
                            
                                R Regular expression for string containing full stops
                            
                                Python regular expression pattern * is not working as expected
                            
                                Separate multiple SQL statements using Regex in Java
                            
                                Regex to replace only if not preceded by
                            
                                How to open or invoke the regexTester plugin in intellij IDEA
                            
                                Getting "error: invalid regular expression"
                            
                                Why does the 'g' flag change the result of a JavaScript regular expression? [duplicate]
                            
                                Python Regex: password must contain at least one uppercase letter and number
                            
                                Add leading zeros within string
                            
                                in notepad++ replace convert comma separated spaces with regular expression

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With