Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shell script with Wget - If else nested inside for loop

I'm trying to make a shell script that reads a list of download URLs to find if they're still active. I'm not sure what's wrong with my current script, (I'm new to this) and any pointers would be a huge help!

user@pc:~/test# cat sites.list

http://www.google.com/images/srpr/logo3w.png
http://www.google.com/doesnt.exist
notasite

Script:

#!/bin/bash
for i in `cat sites.list`
do
wget --spider $i -b
if grep --quiet "200 OK" wget-log; then
echo $i >> ok.txt
else
echo $i >> notok.txt
fi
rm wget-log
done

As is, the script outputs everything to notok.txt - (the first google site should go to ok.txt). But if I run:

wget --spider http://www.google.com/images/srpr/logo3w.png -b

And then do:

grep "200 OK" wget-log

It greps the string without any problems. What noob mistake did I make with the syntax? Thanks m8s!

like image 282
el-noobador Avatar asked Oct 24 '12 02:10

el-noobador


2 Answers

The -b option is sending wget to the background, so you're doing the grep before wget has finished.

Try without the -b option:

if wget --spider $i 2>&1 | grep --quiet "200 OK" ; then
like image 83
German Garcia Avatar answered Sep 22 '22 21:09

German Garcia


There are a few issues with what you're doing.

  • Your for i in will have problems with lines that contain whitespace. Better to use while read to read individual lines of a file.
  • You aren't quoting your variables. What if a line in the file (or word in a line) starts with a hyphen? Then wget will interpret that as an option. You have a potential security risk here, as well as an error.
  • Creating and removing files isn't really necessary. If all you're doing is checking whether a URL is reachable, you can do that without temp files and the extra code to remove them.
  • wget isn't necessarily the best tool for this. I'd advise using curl instead.

So here's a better way to handle this...

#!/bin/bash

sitelist="sites.list"
curl="/usr/bin/curl"

# Some errors, for good measure...
if [[ ! -f "$sitelist" ]]; then
  echo "ERROR: Sitelist is missing." >&2
  exit 1
elif [[ ! -s "$sitelist" ]]; then
  echo "ERROR: Sitelist is empty." >&2
  exit 1
elif [[ ! -x "$curl" ]]; then
  echo "ERROR: I can't work under these conditions." >&2
  exit 1
fi

# Allow more advanced pattern matching (for case..esac below)
shopt -s globstar

while read url; do

  # remove comments
  url=${url%%#*}

  # skip empty lines
  if [[ -z "$url" ]]; then
    continue
  fi

  # Handle just ftp, http and https.
  # We could do full URL pattern matching, but meh.
  case "$url" in
    @(f|ht)tp?(s)://*)
      # Get just the numeric HTTP response code
      http_code=$($curl -sL -w '%{http_code}' "$url" -o /dev/null)
      case "$http_code" in
        200|226)
          # You'll get a 226 in ${http_code} from a valid FTP URL.
          # If all you really care about is that the response is in the 200's,
          # you could match against "2??" instead.
          echo "$url" >> ok.txt
          ;;
        *)
          # You might want different handling for redirects (301/302).
          echo "$url" >> notok.txt
          ;;
      esac
      ;;
    *)
      # If we're here, we didn't get a URL we could read.
      echo "WARNING: invalid url: $url" >&2
      ;;
  esac

done < "$sitelist"

This is untested. For educational purposes only. May contain nuts.

like image 26
ghoti Avatar answered Sep 25 '22 21:09

ghoti