I have a text file with the following characteristics:
I have appended some notes to some of the lines with tentative suggestions for changes to be made to the original words, and now would like to use sed to make those changes for me. So, to give a clearer picture, my file looks like this:
NO NO O
SIGNS NN O #NNS
GIVEN VBD B-VP #VBN
AT IN O
THIS NN O
TIME NN O ## B-NP
. PER O
...
Notes with 1 # are to replace the SECOND word in a line, and notes with 2 #'s are to replace the THIRD word in a line. Would anybody be able to suggest a way to do this with sed (or awk, or anything else)? Again to clarify (hopefully), my goal is to get the pattern following the # or ## and replace the nth word of the line with the matched pattern.
Thanks.
This will work for you:
awk '/#/{sub(/# +/,"#");n=gsub(/#/,"",$NF);$(n+1)=$NF;$NF="\t\t#"}1' file
/#/{ ... }
: Search for lines that contain #
and perform the following steps...sub(/# +/,"#")
: Remove all spaces between the notes and the #
if necessaryn=gsub(/#/,"",$NF)
: Remove all #
from the last field $NF
and set the number of #
's removed to the variable n
$(n+1)=$NF
: Set the n+1 field $(n+1)
to the new last field $NF
which has all the #
stripped off$NF="\t\t#"
: Set the last field $NF
to two tabs followed by a #
1
: Shortcut to tell awk
to print the altered linefile
: Your input file$ awk '/#/{sub(/# +/,"#");n=gsub(/#/,"",$NF);$(n+1)=$NF;$NF="\t\t#"}1' file
NO NO O
SIGNS NNS O #
GIVEN VBN B-VP #
AT IN O
THIS NN O
TIME NN B-NP #
. PER O
...
Note: If you make it so your notes always following the #
with zero spaces in between, you can remove the entire sub(/# +/,"#");
part of the command to make it even shorter
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With