Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for replacing space with comma-space, except at end of line

Tags:

regex

perl

I am trying to covert input file content of this:

NP_418770.2: 257-296 344-415 503-543 556-592 642-707
YP_026226.4: 741-779 811-890 896-979 1043-1077

to this:

NP_418770.2: 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4: 741-779, 811-890, 896-979, 1043-1077

i.e., replace a space with comma and space (excluding newline)

For that, I have tried:

perl -pi.bak -e "s/[^\S\n]+/, /g" input.txt

but it gives:

NP_418770.2:, 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4:, 741-779, 811-890, 896-979, 1043-1077

how can I stop the additional comma which appear after ":" (I want ":" and a single space) without writing another regex?

Thanks

like image 817
J.Carter Avatar asked Nov 02 '16 06:11

J.Carter


People also ask

How do I match a character except space in regex?

You can match a space character with just the space character; [^ ] matches anything but a space character.

How do you handle space in regex?

The most common forms of whitespace you will use with regular expressions are the space (␣), the tab (\t), the new line (\n) and the carriage return (\r) (useful in Windows environments), and these special characters match each of their respective whitespaces.

How do you avoid commas in regex?

So, this regular expression means "At the start of the string ( ^ ), match any character that's not a comma ( [^,] ) one or more times ( + ) until we reach the end of the string ( $ ).


2 Answers

Try using regex negative lookbehind. It is basically look if the character before the space is colon (:) then it don't match that space.

s/(?<!:)[^\S\n]+/, /g
like image 157
Niyoko Avatar answered Nov 03 '22 05:11

Niyoko


You can play with the word-boundary to discard the space that follows the colon: s/\b\h+/, /g

It can be done with perl:

perl -pe's/\b\h+/, /g' file

but also with sed:

sed -E 's/\b[ \t]+/, /g' file

Other approach that uses the field separator:

perl -F'\b\h+' -ape'BEGIN{$,=", "}' file

or do the same with awk:

awk -F'\b[ \t]+' -vOFS=', ' '1' file
like image 4
Casimir et Hippolyte Avatar answered Nov 03 '22 06:11

Casimir et Hippolyte