Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to wget the more recent file of a directory

Tags:

linux

bash

wget

I would like to write a bash script that downloads and install the latest daily build of program (RStudio). Is it possible to make wget to download only the most recent file in the directory http://www.rstudio.org/download/daily/desktop/ ?

like image 805
ECII Avatar asked Dec 09 '22 17:12

ECII


2 Answers

The files seem to be sorted by the release date, with each new release being a new entry with a new name reflecting the version number change, so checking timestamps of a certain file seems unnecessary.

Also, you have provided a link to a "directory", which essentially is a web page. AFAIK, there is no such thing as a directory in http (which is a communication protocol serving you data at the given address). What you see is a listing generated by the server that resembles windows folders for the ease of use, though it's still a web page.

Having that said, you can scrape that web page. The following code downloads the file at first position on the listing (assuming the first one is the most recent one):

#!/bin/bash

wget -q -O tmp.html http://www.rstudio.org/download/daily/desktop/ubuntu64/
RELEASE_URL=`cat tmp.html | grep -m 1 -o -E "https[^<>]*?amd64.deb" | head -1`
rm tmp.html

# TODO Check if the old package name is the same as in RELEASE_URL.

# If not, then get the new version.
wget -q $RELEASE_URL

Now you can check it against your local most-recent version, and install if necessary.

EDIT: Updated version, which does simple version checking and installs the package.

#!/bin/bash

MY_PATH=`dirname "$0"`
RES_DIR="$MY_PATH/res"

# Piping from stdout suggested by Chirlo.
RELEASE_URL=`wget -q -O - http://www.rstudio.org/download/daily/desktop/ubuntu64/ | grep -m 1 -o "https[^\']*"`

if [ "$RELEASE_URL" == "" ]; then
    echo "Package index not found. Maybe the server is down?"
    exit 1
fi

mkdir -p "$RES_DIR"
NEW_PACKAGE=${RELEASE_URL##https*/}
OLD_PACKAGE=`ls "$RES_DIR"`

if [ "$OLD_PACKAGE" == "" ] || [ "$OLD_PACKAGE" != "$NEW_PACKAGE" ]; then

    cd "$RES_DIR"
    rm -f $OLD_PACKAGE

    echo "New version found. Downloading..."
    wget -q $RELEASE_URL

    if [ ! -e "$NEW_PACKAGE" ]; then
        echo "Package not found."
        exit 1
    fi

    echo "Installing..."
    sudo dpkg -i $NEW_PACKAGE

else
    echo "rstudio up to date."
fi

And a couple of comments:

  • The script keeps a local res/ dir with the latest version (exactly one file) and compares it's name with the newly scraped package name. This is dirty (having a file doesn't mean that it has been successfully installed in the past). It would be better to parse the output of dpkg -l, but the name of the package might slightly differ from the scraped one.
  • You will still need to enter the password for sudo, so it won't be 100% automatic. There are a few ways around this, though without supervision you might encounter the previously stated problem.
like image 112
Richard Pump Avatar answered Dec 11 '22 07:12

Richard Pump


A slightly cleaner variation of @Richard Pumps:

RELEASE_URL=$(wget -q -O -  http://www.rstudio.org/download/daily/desktop/ubuntu64 | grep -o -m 1 "https[^\']*" )

# check version from name ...


wget ${RELEASE_URL}

this avoids creating a tmp file by outputing the html file to stdout and filtering it.

like image 21
Chirlo Avatar answered Dec 11 '22 07:12

Chirlo