wget wikimedia image?

Question

I am trying to download an image from Wikimedia Commons by using a URL to a page in the file namespace:

wget http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG

all I get is a JPG file that I cannot open. But when you go to the link you actually see the page instead of the image itself, but there is a link called "Full resolution" that sends you to the real image link which is: http://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG

How can I download this file by having only the first link ?

jitendra · Accepted Answer

You can try the following:

wget http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG -O output.html; wget $(cat output.html | grep fullMedia | sed 's/$.*href="//$$[^ ]*$$" class.*$/\2/g')

The first wget fetches the link you specify. I browsed few pages and found that high resolution images were under div with class=fullMedia. It parses the url of the image and then fetches that image.

PS: As suggested above, bash is not a neat way of doing this. You should look at something that parses dom trees.

wget wikimedia image?

Tags:

bash

shell

wget

wikimedia-commons

Altin Ukshini

1 Answers

jitendra

Recent Activity

Donate For Us

wget wikimedia image?

Tags:

bash

shell

wget

wikimedia-commons

Altin Ukshini

1 Answers

jitendra

Related questions

Recent Activity

Donate For Us