Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download files from Wikimedia Commons by API?

How can I download a lot of audio (.ogg) files from Wikimedia Commons? Is it possible using the Mediawiki API?

like image 861
Saku Avatar asked Mar 14 '23 14:03

Saku


1 Answers

You can use MediaWiki API to get the url download links not only for .ogg but also to any other image or media file uploaded on Wikimedia Commons. From the response you can easy download each one file. Here is an example in C#:

private static void GetFiles(List<string> fileNames)
{
    //Get HTML request with all file names
    var url = "https://commons.wikimedia.org/w/api.php?action=query&format=xml" +
        "&prop=imageinfo&iiprop=url&titles=File:" + string.Join("|File:", fileNames);
    using (var webResponse = (HttpWebResponse)WebRequest.Create(url).GetResponse())
    {
        using (var reader = new StreamReader(webResponse.GetResponseStream()))
        {
            var response = reader.ReadToEnd();

            //Get all file url links by parsing the XML response
            var links = XElement.Parse(response).Descendants("ii")
                .Select(x => x.Attribute("url").Value);
            foreach (var link in links)
            {
                //Save the current file on the disk
                using (var client = new WebClient())
                {
                    var fileName = link.Substring(link.LastIndexOf("/") + 1);
                    client.DownloadFile(link, fileName);
                }
            }
        }
    }
}

Usage:

//list of files to download
var fileNames = new List<string>() {
    "Flag of France.svg", "Black scorpion.jpg", "Stop.png",         //image
    "Jingle Bells.ogg", "Bach Astier 15.flac",                      //audio
    "Cable Car.webm", "Lion.ogv",                                   //video
    "Animalibrí.gif",                                               //animation
};

GetFiles(fileNames);

Note: The API has limit for the files:

Maximum number of values is 50 (500 for bots).

So, if you need to download more files, you will have to split the list in parts and to create another requests.

like image 96
Termininja Avatar answered Apr 01 '23 02:04

Termininja