I would like to download n-th image that google gives me with command line i.e. like with command wget
To search image of [something]
I just go to page https://www.google.cz/search?q=[something]&tbm=isch
but how do I get url of n-th search result so I can use wget?
First attempt
First you need to set the user agent so google will authorize output from searches. Then we can look for images and select the desired one. To accomplish that we insert missing newlines, wget will return google searches on one single line, and filter the link. The index of the file is stored in the variable count
.
$ count=10
$ imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - "www.google.be/search?q=something\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
$ wget $imagelink
The image will now be in your working directory, you can tweak the last command and specify a desired output file name.
You can summarize it in a shell script:
#! /bin/bash
count=${1}
shift
query="$@"
[ -z $query ] && exit 1 # insufficient arguments
imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - | "www.google.be/search?q=${query}\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
wget -qO google_image $imagelink
Example usage:
$ ls
Documents
Downloads
Music
script.sh
$ chmod +x script.sh
$ bash script.sh 5 awesome
$ ls
Documents
Downloads
google_image
Music
script.sh
Now the google_image
should contain the fifth google image when looking for 'awesome'. If you experience any bugs, let me know, I'll take care of them.
Better code
The problem with this code is that it returns pictures in low resolution. A better solution is as follows:
#! /bin/bash
# function to create all dirs til file can be made
function mkdirs {
file="$1"
dir="/"
# convert to full path
if [ "${file##/*}" ]; then
file="${PWD}/${file}"
fi
# dir name of following dir
next="${file#/}"
# while not filename
while [ "${next//[^\/]/}" ]; do
# create dir if doesn't exist
[ -d "${dir}" ] || mkdir "${dir}"
dir="${dir}/${next%%/*}"
next="${next#*/}"
done
# last directory to make
[ -d "${dir}" ] || mkdir "${dir}"
}
# get optional 'o' flag, this will open the image after download
getopts 'o' option
[[ $option = 'o' ]] && shift
# parse arguments
count=${1}
shift
query="$@"
[ -z "$query" ] && exit 1 # insufficient arguments
# set user agent, customize this by visiting http://whatsmyuseragent.com/
useragent='Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:31.0) Gecko/20100101 Firefox/31.0'
# construct google link
link="www.google.cz/search?q=${query}\&tbm=isch"
# fetch link for download
imagelink=$(wget -e robots=off --user-agent "$useragent" -qO - "$link" | sed 's/</\n</g' | grep '<a href.*\(png\|jpg\|jpeg\)' | sed 's/.*imgurl=\([^&]*\)\&.*/\1/' | head -n $count | tail -n1)
imagelink="${imagelink%\%*}"
# get file extention (.png, .jpg, .jpeg)
ext=$(echo $imagelink | sed "s/.*\(\.[^\.]*\)$/\1/")
# set default save location and file name change this!!
dir="$PWD"
file="google image"
# get optional second argument, which defines the file name or dir
if [[ $# -eq 2 ]]; then
if [ -d "$2" ]; then
dir="$2"
else
file="${2}"
mkdirs "${dir}"
dir=""
fi
fi
# construct image link: add 'echo "${google_image}"'
# after this line for debug output
google_image="${dir}/${file}"
# construct name, append number if file exists
if [[ -e "${google_image}${ext}" ]] ; then
i=0
while [[ -e "${google_image}(${i})${ext}" ]] ; do
((i++))
done
google_image="${google_image}(${i})${ext}"
else
google_image="${google_image}${ext}"
fi
# get actual picture and store in google_image.$ext
wget --max-redirect 0 -qO "${google_image}" "${imagelink}"
# if 'o' flag supplied: open image
[[ $option = "o" ]] && gnome-open "${google_image}"
# successful execution, exit code 0
exit 0
The comments should be self explanatory, if you have any questions about the code (such as the long pipeline) I'll be happy to clarify the mechanics. Note that I had to set a more detailed user agent on the wget, it may happen that you need to set a different user agent but I don't think it'll be a problem. If you do have a problem, visit http://whatsmyuseragent.com/ and supply the output in the useragent
variable.
When you wish to open the image instead of only downloading, use the -o
flag, example below. If you wish to extend the script and also include a custom output file name, just let me know and I'll add it for you.
Example usage:
$ chmod +x getimg.sh
$ ./getimg.sh 1 dog
$ gnome-open google_image.jpg
$ ./getimg.sh -o 10 donkey
This is an addition to the answer provided by ShellFish. Much respect to them for working this out. :)
Google have recently changed their web-code for the image results page which has, unfortunately, broken Shellfish's code. I was using it every night in a cron job up until about 4 days ago when it stopped receiving search-results. While investigating this, I found that Google have removed elements like imgurl and have shifted a lot more into javascript.
My solution is an expansion of Shellfish's great code but has modifications to handle these Google changes and includes some 'enhancements' of my own.
It performs a single Google search, saves the results, bulk-downloads a specified number of images, then builds these into a single gallery-image using ImageMagick. Up to 1,000 images can be requested.
This bash script is available at https://git.io/googliser
Thank you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With