Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sed with simultaneous and sequential replace

Tags:

string

bash

sed

awk

I'm not sure this is possible to do what I want in sed (or awk or any bash tool):

I want to make a script that replaces : ) in a string by <happy> and ) : by <sad>. This can easily be done with sed with:

echo "test : )" | sed 's/: )/<happy>/g'
echo "test ) :" | sed 's/) :/<sad>/g'

Unfortunately, sometimes I have strings like these:

I'm happy : ) : ) : )
I'm sad ) : ) : ) :

In that case, the output should be:

I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>

But by combining the two commands above:

echo "I'm happy : ) : ) : )" | sed 's/: )/<happy>/g' | sed 's/) :/<sad>/g'
echo "I'm sad ) : ) : ) :" | sed 's/: )/<happy>/g' | sed 's/) :/<sad>/g'

I will get:

I'm happy <happy> <happy> <happy>
I'm sad ) <happy> <happy> :

The way to solve this would be to do both replacements in parallel, by treating the string from left to right. I tried to use something like this: sed 's/a/b/g;s/c/d/g' but the replacement is only done one pattern after one other, and doesn't solve the problem.

like image 782
dhokas Avatar asked Aug 16 '18 22:08

dhokas


1 Answers

With GNU awk for the 3rd arg to match():

$ cat script1.awk
BEGIN {
    map[": )"] = "<happy>"
    map[") :"] = "<sad>"
}
{
    while ( match($0,/(.*)(: \)|\) :)(.*)/,a) ) {
        $0 = a[1] map[a[2]] a[3]
    }
    print
}

$ awk -f script1.awk file
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>

With any awk:

$ cat script2.awk
BEGIN {
    map[": )"] = "<happy>"
    map[") :"] = "<sad>"
}
{
    while ( match($0,/: \)|\) :/) ) {
        $0 = substr($0,1,RSTART-1) map[substr($0,RSTART,RLENGTH)] substr($0,RSTART+RLENGTH)
    }
    print
}

$ awk -f script2.awk file
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>

Although both approaches produce the same output in this case, the first approach actually works from the end of the string to the front courtesy of the leading .* while the second approach works front to back. You can see that with this test:

$ echo ': ) :' | awk -f script1.awk
: <sad>

$ echo ': ) :' | awk -f script2.awk
<happy> :

You can do a back-to-front pass with any awk with a tweak but I don't think that's what you really want anyway.


Edit to build the regexp from the map:

$ cat tst.awk
BEGIN {
    map[": )"] = "<happy>"
    map[") :"] = "<sad>"
    for (emoji in map) {
        gsub(/[^^]/,"[&]",emoji)
        gsub(/\^/,"\\^",emoji)
        emojis = (emojis == "" ? "" : emojis "|") emoji
    }
}
{
    while ( match($0,emojis) ) {
        $0 = substr($0,1,RSTART-1) map[substr($0,RSTART,RLENGTH)] substr($0,RSTART+RLENGTH)
    }
    print
}

$ awk -f tst.awk file
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>
like image 107
Ed Morton Avatar answered Sep 18 '22 15:09

Ed Morton