Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting lines in a file alphabetically using awk and/or sed

Tags:

sed

awk

I have a file with several hundred lines formatted as so:

#blah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/xyz(|/)$ http://www.blah.com/404.html [R=301,L,NC]

#xblah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/hkf(|/)$ http://www.blah.com/404.html [R=301,L,NC]

#ablah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/abc/.*(|/)$ http://www.blah.com/404.html [R=301,L,NC]

I would like to create a script in awk/sed to alphabetize this file using the second argument in the third line of each group of text. In the case of this file it's the "abc", "hkf" or "xyz" which could be anything - they're the redirects being created in this apache redirects file.

I figured that what I wanted to do was:

  1. concatenate each group of three lines into one line with a delimiter between each line
  2. sort the lines using sort -k3,3
  3. then re-assemble the 3 line constructs with a separating blank line
  4. write to file

My expected output would look like this:

#ablah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/abc/.*(|/)$ http://www.blah.com/404.html [R=301,L,NC]

#xblah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/hkf(|/)$ http://www.blah.com/404.html [R=301,L,NC]

#blah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/xyz(|/)$ http://www.blah.com/404.html [R=301,L,NC]

Does this make sense? Is there a better way to do this?

p.s. my intent is to make the script portable so it can be used on several files of this structure. When suggesting code to solve the problem, please spell it out as best possible for a rank newb like me to start to understand how to tackle this problem efficiently and be able to extend the end result.

Any and all help greatly appreciated.

like image 944
user3043123 Avatar asked Nov 27 '13 19:11

user3043123


People also ask

How do I sort by alphabetical order in Linux?

In the Linux system, you will find one command named sort. This command can sort your data alphabetically. Here flag -k is used to select a field for sorting.

What command is used to sort the lines in a file in alphabetical order?

The sort command sorts the contents of a file, in numeric or alphabetic order, and prints the results to standard output (usually the terminal screen).

Can we use awk and sed together?

Combining the Two awk and sed are both incredibly powerful when combined. You can do this by using Unix pipes.


Video Answer


3 Answers

You can do the whole operation in Gnu Awk:

awk -f sort.awk input.txt

where sort.awk is

BEGIN {
    RS=""
}
{
    match($0,/RewriteRule \^\/(.*)\(\|/,a)
    key[NR]=a[1] "\t" NR
    block[NR]=$0
}

END {
    asort(key)
    for (i=1; i<=NR; i++) {
        split(key[i],a,"\t")
        print block[a[2]]
        printf "\n"
    }
}

Produces:

#ablah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/abc/.*(|/)$ http://www.blah.com/404.html [R=301,L,NC]

#xblah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/hkf(|/)$ http://www.blah.com/404.html [R=301,L,NC]

#blah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/xyz(|/)$ http://www.blah.com/404.html [R=301,L,NC]
like image 105
Håkon Hægland Avatar answered Oct 01 '22 12:10

Håkon Hægland


Your idea seemed a simple enough methodology. This seems to work for me on your test data. It does add extra blank lines though and I'm not focusing enough at the moment to sort that out.

awk '/^#/,/^$/ {printf "%s\0",$0} /^$/ {print ""} END {print ""}' 20250937.input | sort -t'\0' -k3,3 | tr '\0' '\n'
  1. For all lines between /^#/ and /^$/ print the lines out with a null instead of a newline terminator.
  2. When we see a blank line also print out a newline.
  3. Ensure our output is terminated by a newline.
  4. Sort on our fields.
  5. Transform nulls back into newlines.
like image 41
Etan Reisner Avatar answered Oct 03 '22 12:10

Etan Reisner


Some sed version:

sed -n '/^#/{N;h;n;H;x;s/\n/XnlX/g;x;s!.*\^/\([a-z]*\).*!\1!;G;s/\n/ /;p}' input \
         | sort |  sed 's/[^ ]* //;s/$/\n/;s/XnlX/\n/g'

Produces:

 #ablah 
 RewriteCond %{HTTP_HOST} www.blah.com [NC] 
 RewriteRule ^/abc/.*(|/)$ http://www.blah.com/404.html [R=301,L,NC]

 #xblah 
 RewriteCond %{HTTP_HOST} www.blah.com [NC] 
 RewriteRule ^/hkf(|/)$ http://www.blah.com/404.html [R=301,L,NC]

 #blah 
 RewriteCond %{HTTP_HOST} www.blah.com [NC] 
 RewriteRule ^/xyz(|/)$ http://www.blah.com/404.html [R=301,L,NC]
like image 21
perreal Avatar answered Oct 02 '22 12:10

perreal