I have a file with several hundred lines formatted as so:
#blah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/xyz(|/)$ http://www.blah.com/404.html [R=301,L,NC]
#xblah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/hkf(|/)$ http://www.blah.com/404.html [R=301,L,NC]
#ablah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/abc/.*(|/)$ http://www.blah.com/404.html [R=301,L,NC]
I would like to create a script in awk/sed to alphabetize this file using the second argument in the third line of each group of text. In the case of this file it's the "abc", "hkf" or "xyz" which could be anything - they're the redirects being created in this apache redirects file.
I figured that what I wanted to do was:
My expected output would look like this:
#ablah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/abc/.*(|/)$ http://www.blah.com/404.html [R=301,L,NC]
#xblah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/hkf(|/)$ http://www.blah.com/404.html [R=301,L,NC]
#blah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/xyz(|/)$ http://www.blah.com/404.html [R=301,L,NC]
Does this make sense? Is there a better way to do this?
p.s. my intent is to make the script portable so it can be used on several files of this structure. When suggesting code to solve the problem, please spell it out as best possible for a rank newb like me to start to understand how to tackle this problem efficiently and be able to extend the end result.
Any and all help greatly appreciated.
In the Linux system, you will find one command named sort. This command can sort your data alphabetically. Here flag -k is used to select a field for sorting.
The sort command sorts the contents of a file, in numeric or alphabetic order, and prints the results to standard output (usually the terminal screen).
Combining the Two awk and sed are both incredibly powerful when combined. You can do this by using Unix pipes.
You can do the whole operation in Gnu Awk:
awk -f sort.awk input.txt
where sort.awk
is
BEGIN {
RS=""
}
{
match($0,/RewriteRule \^\/(.*)\(\|/,a)
key[NR]=a[1] "\t" NR
block[NR]=$0
}
END {
asort(key)
for (i=1; i<=NR; i++) {
split(key[i],a,"\t")
print block[a[2]]
printf "\n"
}
}
Produces:
#ablah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/abc/.*(|/)$ http://www.blah.com/404.html [R=301,L,NC]
#xblah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/hkf(|/)$ http://www.blah.com/404.html [R=301,L,NC]
#blah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/xyz(|/)$ http://www.blah.com/404.html [R=301,L,NC]
Your idea seemed a simple enough methodology. This seems to work for me on your test data. It does add extra blank lines though and I'm not focusing enough at the moment to sort that out.
awk '/^#/,/^$/ {printf "%s\0",$0} /^$/ {print ""} END {print ""}' 20250937.input | sort -t'\0' -k3,3 | tr '\0' '\n'
Some sed version:
sed -n '/^#/{N;h;n;H;x;s/\n/XnlX/g;x;s!.*\^/\([a-z]*\).*!\1!;G;s/\n/ /;p}' input \
| sort | sed 's/[^ ]* //;s/$/\n/;s/XnlX/\n/g'
Produces:
#ablah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/abc/.*(|/)$ http://www.blah.com/404.html [R=301,L,NC]
#xblah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/hkf(|/)$ http://www.blah.com/404.html [R=301,L,NC]
#blah
RewriteCond %{HTTP_HOST} www.blah.com [NC]
RewriteRule ^/xyz(|/)$ http://www.blah.com/404.html [R=301,L,NC]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With