Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wiki quotes API?

I would want to get a structured version of a Wikiquote page via JSON (basically I need all phrases)

Example: http://en.wikiquote.org/wiki/Fight_Club_(film)

I tried with: http://en.wikiquote.org/w/api.php?format=xml&action=parse&page=Fight_Club_(film)&prop=text

but I get all HTML source code. I need each pharse as an element of an Array

How could I achieve that with DBPEDIA?

http://f.cl.ly/items/2v3w1U2c0J0z1M0V0k0b/Schermata%2012-2456269%20alle%2013.06.24.png

like image 306
sparkle Avatar asked Dec 07 '12 12:12

sparkle


2 Answers

For one thing Iam not sure whether you can query wiki quotes using DBpedia and secondly, DBpedia gives you only info box data in a structured way, it does not in a any way the article content in a structured way. Instead with a little bit of trouble you can use the Media wiki api to get the data


EDIT

The URI you are trying gives you a text so this will make things easier, but not completely.

Try this piece of code in your console:

require 'Nokogiri'

content = JSON.parse(open("http://en.wikiquote.org/w/api.php?format=json&action=parse&page=Fight_Club_%28film%29&prop=text").read)

data = content['parse']['text']['*']

xpath_data = Nokogiri::HTML data

xpath_data.xpath("//ul/li").map{|data_node| data_node.text}

This is the closest I have come to an answer, of course this is not completely right because you will get a lot on unnecessary data. But if you dig into Nokogiri and xpath and find out how to pin point the nodes you need you can get a solution which will give you correct quotes at least 90% of the time.

like image 162
djd Avatar answered Nov 16 '22 03:11

djd


Just change the format to JSON. Look up the Wikipedia API for more details. http://en.wikiquote.org/w/api.php?format=json&action=parse&page=Fight_Club_(film)&prop=text

like image 23
R891 Avatar answered Nov 16 '22 01:11

R891