Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing simple string with awk or sed in linux

Tags:

parsing

sed

awk

original string :
A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/

Depth of directories will vary, but /trunk part will always remain the same. And a single character in front of /trunk is the indicator of that line.

desired output :

A /trunk/apple
B /trunk/apple
Z /trunk/orange
Q /trunk/melon/juice/venti/straw

*** edit
I'm sorry I made a mistake by adding a slash at the end of each path in the original string which made the output confusing. Original string didn't have the slash in front of the capital letter, but I'll leave it be.

my attempt :

echo $str1 | sed 's/\(.\/trunk\)/\n\1/g'

I feel like it should work but it doesn't.

like image 788
Lunartist Avatar asked Dec 03 '22 16:12

Lunartist


2 Answers

With GNU awk for multi-char RS and RT:

$ awk -v RS='([^/]+/){2}[^/\n]+' 'RT{sub("/",OFS,RT); print RT}' file
A trunk/apple
B trunk/apple
Z trunk/orange

I'm setting RS to a regexp describing each string you want to match, i.e. 2 repetitions of non-/s followed by / and then a final string of non-/s (and non-newline for the last string on the input line). RT is automatically set to each of the matching strings, so then I just change the first / to a blank and print the result.

If each path isn't always 3 levels deep but does always start with something/trunk/, e.g.:

$ cat file
A/trunk/apple/banana/B/trunk/apple/Z/trunk/orange

then:

$ awk -v RS='[^/]+/trunk/' 'RT{if (NR>1) print pfx $0; pfx=gensub("/"," ",1,RT)} END{printf "%s%s", pfx, $0}' file
A trunk/apple/banana/
B trunk/apple/
Z trunk/orange
like image 175
Ed Morton Avatar answered Dec 26 '22 10:12

Ed Morton


To deal with complex samples input, like where there could be N number of / and values after trunk in a single line please try following.

awk '
{
  gsub(/[^/]*\/trunk/,OFS"&")
  sub(/^ /,"")
  sub(/\//,OFS"&")
  gsub(/ +[^/]*\/trunk\/[^[:space:]]+/,"\n&")
  sub(/\n/,OFS)
  gsub(/\n /,ORS)
  gsub(/\/trunk/,OFS"&")
  sub(/[[:space:]]+/,OFS)
}
1
'  Input_file

Explanation: Adding detailed explanation for above.

awk '                                            ##Starting awk program from here.
{
  gsub(/[^/]*\/trunk/,OFS"&")                    ##Globally substituting everything from / to till next / followed by trunk/ with space and matched value.
  sub(/^ /,"")                                   ##Substituting starting space with NULL here.
  sub(/\//,OFS"&")                               ##Substituting first / with space / here.
  gsub(/ +[^/]*\/trunk\/[^[:space:]]+/,"\n&")    ##Globally substituting spaces followed by everything till / trunk till space comes with new line and matched values.
  sub(/\n/,OFS)                                  ##Substituting new line with space.
  gsub(/\n /,ORS)                                ##Globally substituting new line space with ORS.
  gsub(/\/trunk/,OFS"&")                         ##Globally substituting /trunk with OFS and matched value.
  sub(/[[:space:]]+/,OFS)                        ##Substituting spaces with OFS here.
}
1                                                ##Printing edited/non-edited line here.
'  Input_file                                    ##Mentioning Input_file name here.


With your shown samples, please try following awk code.

awk '{gsub(/\/trunk/,OFS "&");gsub(/trunk\/[^/]*\//,"&\n")} 1' Input_file
like image 26
RavinderSingh13 Avatar answered Dec 26 '22 11:12

RavinderSingh13