I am trying to covert input file content of this:
NP_418770.2: 257-296 344-415 503-543 556-592 642-707
YP_026226.4: 741-779 811-890 896-979 1043-1077
to this:
NP_418770.2: 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4: 741-779, 811-890, 896-979, 1043-1077
i.e., replace a space with comma and space (excluding newline)
For that, I have tried:
perl -pi.bak -e "s/[^\S\n]+/, /g" input.txt
but it gives:
NP_418770.2:, 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4:, 741-779, 811-890, 896-979, 1043-1077
how can I stop the additional comma which appear after ":" (I want ":" and a single space) without writing another regex?
Thanks
You can match a space character with just the space character; [^ ] matches anything but a space character.
The most common forms of whitespace you will use with regular expressions are the space (␣), the tab (\t), the new line (\n) and the carriage return (\r) (useful in Windows environments), and these special characters match each of their respective whitespaces.
So, this regular expression means "At the start of the string ( ^ ), match any character that's not a comma ( [^,] ) one or more times ( + ) until we reach the end of the string ( $ ).
Try using regex negative lookbehind. It is basically look if the character before the space is colon (:
) then it don't match that space.
s/(?<!:)[^\S\n]+/, /g
You can play with the word-boundary to discard the space that follows the colon: s/\b\h+/, /g
It can be done with perl:
perl -pe's/\b\h+/, /g' file
but also with sed:
sed -E 's/\b[ \t]+/, /g' file
Other approach that uses the field separator:
perl -F'\b\h+' -ape'BEGIN{$,=", "}' file
or do the same with awk:
awk -F'\b[ \t]+' -vOFS=', ' '1' file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With