Hi everyone my data looks like this
samplename 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ...
samplename2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 ...
and I want it to look like this:
>samplename
0 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 ...
>samplename2
0 0 0 0 0 1 1 1 1 1
1 1 1 1 1 1 0 0 0 ...
[note - showing a line break after every 10 digits; I actually want it after every 200, but I realize that showing a line like that would not be very helpful].
I could do it using regular expression on a text editor but I want to use the sed command in the bash because I have to do this several times and I need 200 characters per row.
I tried this but got an error:
sed -e "s/\(>\w+\)\s\([0-9]+\)/\1\n\2" < myfile > myfile2
sed: 1: "s/(>\w+)\s([0-9]+)/ ...": unescaped newline inside substitute pattern
One more note - I am doing this on a Mac; I know that sed on the Mac is a little bit different from gnu sed . If you are able to give me the solution that works for a Mac that would be great.
Thanks in advance.
With your added request for a line break after 200 numbers, you are much better off using awk.
echo "hello 1 2 3 4" | awk '{print ">"$1; for(i=2; i<=NF; i++) {printf("%d ",$i); if((i+1)%2 == 0) printf("\n");}}
prints out
>hello
1 2
3 4
If you want this to work only on lines that start with hello, you can modify as
echo "hello 1 2 3 4" | awk '/^hello / {print ">"$1; for(i=2; =NF; i++) {printf("%d ",$i); if((i+1)%2 == 0) printf("\n");}}
(the regular expression in the / / says "only do this on lines that match this expression".
You can modify the statement if( (i + 1) % 2 == 0) to be if( (i + 1) % 100 == 0 ) to get a newline after 100 digits... I just showed it for 2 because the printout is more readable.
update to make this all much cleaner, do the following.
Create a file call breakIt with the following contents: (leave out the /^hello / if you don't want to select only lines starting with "hello"; but leave the {} around the code, it matters).
/^hello/ { print ">"$1;
for(i=2; i<=NF; i++)
{
printf("%d ",$i);
if((i+1)%100 == 0) printf("\n");
}
print "";
}
Now you can issue the command
awk -f breakIt inputFile > outputFile
This says "use the contents of breakIt as the commands to process inputFile and put the results in outputFile".
Should do the trick nicely for you.
edit just in case you really do want a sed solution, here is a nice one (well I think so). Copy the following into a file called sedSplit
s/^([A-Za-z]+ )/>\1\
/g
s/([0-9 ]{10})/\1\
/g
s/$/\
/g
This has three consecutive sed commands; these are each on their own line, but since they insert newlines, they actually appear to take six lines.
s/^ - substitute, starting from the beginning of the line
([A-Za-z]+ )/ - substitute the first word (letters only) plus space, replacing with
>\1\
/g - the literal '>', then the first match, then a newline, as often as needed (g)
s/([0-9] ]{10})/ - substitute 10 repetitions of [digit followed by space]
\1\
/g - replace with itself, followed by newline, as often as needed
s/$/\
/g - replace the 'end of line' with a carriage return
You invoke this sed script like this:
sed -E -f sedSplit < inputFile > outputFile
This uses the
-E flag (use extended regular expressions - no need for escaping brackets and such)
-f flag ('get instructions from this file')
It makes the whole thing much cleaner - and gives you the output you asked for on a Mac (even with an extra carriage return to separate the groups; if you don't want that, leave out the last two lines).
$ awk '{print ">" $1; for (i=2;i<=NF;i++) printf "%s%s", $i, ((i-1)%10 ? FS : RS)}' file
>samplename
0 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 ...
>samplename2
0 0 0 0 0 1 1 1 1 1
1 1 1 1 1 1 0 0 0 ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With