Given a space separated file as such:
0.0 1:0.000000 2:1.000000 3:0.000000 4:0.000000 5:1.000000 6:0.000000 7:1.000000 8:1.000000 9:1.000000 10:1.000000 11:1.000000 12:1.000000 13:1.000000 14:1.000000 15:0.919033 16:1.000000 17:1.000000 18:1.000000 19:1.000000 20:0.000000 21:0.037771
0.0 1:0.000000 2:1.000000 3:0.000000 4:0.000000 5:1.000000 6:0.000000 7:1.000000 8:0.800000 9:0.666667 10:1.000000 11:0.800000 12:0.666667 13:1.000000 14:0.875000 15:0.874574 16:0.848662 17:0.901802 18:0.938795 19:0.903077 20:0.333332 21:0.196682
0.0 1:1.098612 2:1.000000 3:1.000000 4:0.000000 5:1.000000 6:0.000000 7:1.000000 8:0.800000 9:0.500000 10:0.000000 11:0.800000 12:0.500000 13:0.000000 14:0.909091 15:0.780985 16:0.792052 17:0.865396 18:0.863982 19:0.832962 20:0.000000 21:0.069470
0.0 1:0.000000 2:1.000000 3:0.000000 4:0.000000 5:1.000000 6:0.000000 7:1.000000 8:0.923077 9:0.909091 10:0.888889 11:0.923077 12:0.909091 13:0.888889 14:0.943396 15:0.923562 16:0.923871 17:0.949357 18:0.950790 19:0.944919 20:0.142857 21:0.140054
The first columns are all 0.0
and we want to throw that first column away. Then for each column, there's a colon separating the key from its value. And the goal is only to keep the value.
I can do it as such in python
:
with io.open(infile, 'r') as fin:
for line in fin:
line = line.split()[1:]
line = '\t'.join([i.split(':')[1] for i in line])
print line
[out]:
0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 1.000000 1.000000 1.000000 1.0000001.000000 1.000000 1.000000 0.919033 1.000000 1.000000 1.000000 1.000000 0.000000 0.037771
0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.800000 0.666667 1.000000 0.8000000.666667 1.000000 0.875000 0.874574 0.848662 0.901802 0.938795 0.903077 0.333332 0.196682
1.098612 1.000000 1.000000 0.000000 1.000000 0.000000 1.000000 0.800000 0.500000 0.000000 0.8000000.500000 0.000000 0.909091 0.780985 0.792052 0.865396 0.863982 0.832962 0.000000 0.069470
0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.923077 0.909091 0.888889 0.9230770.909091 0.888889 0.943396 0.923562 0.923871 0.949357 0.950790 0.944919 0.142857 0.140054
But how is the same possible on the unix command line? (maybe with sed
, awk
, perl -c
or even python -c
or anything else) Imagine it's a large file, so please don't load the whole file it memory, unless there's an economical reason for that.
With GNU sed:
sed 's/^0.0 //;s/[0-9]\+:\([0-9.]\+\)/\1/g' file
Output:
0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.919033 1.000000 1.000000 1.000000 1.000000 0.000000 0.037771 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.800000 0.666667 1.000000 0.800000 0.666667 1.000000 0.875000 0.874574 0.848662 0.901802 0.938795 0.903077 0.333332 0.196682 1.098612 1.000000 1.000000 0.000000 1.000000 0.000000 1.000000 0.800000 0.500000 0.000000 0.800000 0.500000 0.000000 0.909091 0.780985 0.792052 0.865396 0.863982 0.832962 0.000000 0.069470 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.923077 0.909091 0.888889 0.923077 0.909091 0.888889 0.943396 0.923562 0.923871 0.949357 0.950790 0.944919 0.142857 0.140054
If you want to edit your file "in place" add sed's option -i
.
Two sed scripts separated with ;
:
s/^0.0 //
: search from start of line (^
) for0.0
followed by a whitespace and replace it by nothing
s/[0-9]\+:\([0-9.]\+\)/\1/g
: search for any at least one (+
) character from range0
to9
followed by:
and followed by at least one (+
) character from range0
to9
or a.
and replace it by matching part in round brackets.\1
is the back references to matching part in round brackets.g
means global to apply the replacement to all matches to the regexp, not just the first. It is necessary to escape special characters (here: +
,(
,)
) for sed with a\
.
Shorter version:
sed 's/^0.0 //;s/[0-9]\+://g' file
See: The Stack Overflow Regular Expressions FAQ
awk
to the rescue!
$ awk '{gsub("[^ ]*:","");sub("[^ ]* ","")}1' file
0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.919033 1.000000 1.000000 1.000000 1.000000 0.000000 0.037771
0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.800000 0.666667 1.000000 0.800000 0.666667 1.000000 0.875000 0.874574 0.848662 0.901802 0.938795 0.903077 0.333332 0.196682
1.098612 1.000000 1.000000 0.000000 1.000000 0.000000 1.000000 0.800000 0.500000 0.000000 0.800000 0.500000 0.000000 0.909091 0.780985 0.792052 0.865396 0.863982 0.832962 0.000000 0.069470
0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.923077 0.909091 0.888889 0.923077 0.909091 0.888889 0.943396 0.923562 0.923871 0.949357 0.950790 0.944919 0.142857 0.140054
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With