Is there a way to extract Wiktionary data without scraping?

Question

I know there's DBPedia for Wikipedia, but does something like that exist for Wiktionary? I'd like to get something like https://en.wiktionary.org/wiki/Category:en:Occupations into JSON or similar format.

amirouche · Accepted Answer

Another way to go would be to load wiktionary category SQL dump into mysql from wikimedia data dump e.g. enwiktionary-20190901-category.sql.gz.

Then use https://en.wiktionary.org/api/rest_v1/ to retrieve (and parse!) the html for the info you need.

Good luck!

dodecaplex · Answer

There is DBpedia for wikipedia and there is DBnary for wiktionary. See http://kaiko.getalp.org/about-dbnary

TLDR: DBnary extracts 25 language editions of wiktionary and produces an RDF dataset (using ontolex ontology) that can be imported in a quad store and queried. New version twice a month.

Drawback, not all data is extracted and modeled, you can file in a feature request at the dbanry extractor gitlab: https://gitlab.com/gilles.serasset/dbnary

The Categories are usually not extracted as these come from template processing and would require to transclude every page of every editions for every dump and transclusion is not cheap (expecially when it implies Lua, as it is the case for most pages in the English edition).

Note: I am the author of DBnary...

Is there a way to extract Wiktionary data without scraping?

Tags:

wikipedia

wiktionary

dbpedia

Jonathan

2 Answers

amirouche

dodecaplex

Recent Activity

Donate For Us

Is there a way to extract Wiktionary data without scraping?

Tags:

wikipedia

wiktionary

dbpedia

Jonathan

2 Answers

amirouche

dodecaplex

Related questions

Recent Activity

Donate For Us