Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download images programmatically from Wikimedia Commons without registering for a Bot account?

It seems like the only way to get approval for a Bot account is if it adds to or edits information already on Wikimedia. If you try to download any images, without a bot account, using some of the api libraries out there you get error messages instead of the images. Seems like they block anyone not coming in from a browser? Anyone else have any experience with this? Am I missing something here?

like image 955
tomvon Avatar asked Sep 23 '09 17:09

tomvon


People also ask

Can I use Wikimedia Commons images commercially?

All media files on Wikimedia Commons can be used by anyone, including commercially and each media file has information about which license it uses. The most common licenses used are created by Creative Commons which require the author to be credited.

Are images on Wikimedia Commons copyright free?

Wikimedia Commons only accepts free content, that is, images and other media files that are not subject to copyright restrictions which would prevent them being used by anyone, anytime, for any purpose.

Do you have to cite images from Wikimedia Commons?

Wikimedia Commons is similar to Wikipedia. All images, sounds, and videos are contributed by the public and are free to use. However, they must be cited for attribution because most have a Creative Commons license.


2 Answers

Having just done this myself I feel I should share:

http://www.mediawiki.org/wiki/API:Allimages

This API document does state that you can query the images:

http://en.wikipedia.org/w/api.php?action=query&list=allimages&aiprop=url&format=xml&ailimit=10&aifrom=Albert

with the aiprop=url you are given the url of the image you are looking for.

like image 175
Phil Hannent Avatar answered Sep 18 '22 01:09

Phil Hannent


Try explaining exactly what you want to do? And what you've tried? What error message did you get? You're not very clear...

What libraries have you tried? If you're not aggressive, there are no restrictions in downloading WM content. I've never heard of any restrictions. Some User-Agents are banned from editing to avoid stupid spamming, but really, I've never heard of downloading restrictions.

If you are trying to scrape a massive amount of images, downloading them through Commons, you're doing it wrong (tm). If you are trying to get a few images, anywhere from 10 to 200, you should be able to write a decent tool in a few lines of code, provided that you are respecting the throttling requirement: when the API tells you to slow down, if you don't do it, sysadmins are likely to kick you out.

If you need a complete image dump, (we're talking of a few TBs) try asking on wikitech-l. We had torrents available when there were less images, now it's more complicated, but still doable.

About bot accounts. How deep have you looked in the system? You need a bot account for fast, unsupervised edits. Bot privileges also open a few facilities such as increased query sizes. But remember: bot account? it's simply an augmented user-account. Have you tried running anything with a classical account?

like image 36
Nicolas Dumazet Avatar answered Sep 20 '22 01:09

Nicolas Dumazet