I know there's DBPedia for Wikipedia, but does something like that exist for Wiktionary? I'd like to get something like https://en.wiktionary.org/wiki/Category:en:Occupations into JSON or similar format.
Another way to go would be to load wiktionary category SQL dump into mysql from wikimedia data dump e.g. enwiktionary-20190901-category.sql.gz.
Then use https://en.wiktionary.org/api/rest_v1/ to retrieve (and parse!) the html for the info you need.
Good luck!
There is DBpedia for wikipedia and there is DBnary for wiktionary. See http://kaiko.getalp.org/about-dbnary
TLDR: DBnary extracts 25 language editions of wiktionary and produces an RDF dataset (using ontolex ontology) that can be imported in a quad store and queried. New version twice a month.
Drawback, not all data is extracted and modeled, you can file in a feature request at the dbanry extractor gitlab: https://gitlab.com/gilles.serasset/dbnary
The Categories are usually not extracted as these come from template processing and would require to transclude every page of every editions for every dump and transclusion is not cheap (expecially when it implies Lua, as it is the case for most pages in the English edition).
Note: I am the author of DBnary...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With