Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert huge linked data dumps (RDF/XML, JSON-LD, TTL) to TSV/CSV

Linked data collections are usually given in RDF/XML, JSON-LD, or TTL format. Relatively large data dumps seem fairly difficult to process. What is a good way to convert an RDF/XML file to a TSV of triplets of linked data?

I've tried OpenRefine, which should handle this, but a 10GB file, (e.g. the person authority information from German National Library) is too difficult to process on a laptop with decent processing power.

Looking for software recommendations or some e.g. Python/R code to convert it. Thanks!

like image 341
puslet88 Avatar asked Oct 21 '25 17:10

puslet88


2 Answers

Try these:

Lobid GND API

http://lobid.org/gnd/api

Supports OpenRefine (see blogpost) and a variety of other queries. The data is hosted as JSON-LD (see context) in an elasticsearch cluster. The service offers a rich HTTP-API.

Use a Triple Store

Load the data to a triple store of your choice, e.g. rdf4j. Many triple stores provide some sort of CSV serialization. Together with SPARQL this could be worth a try.

Catmandu

http://librecat.org/Catmandu/

A strong perl based data toolkit that comes with a useful collection of ready-to-use transformation pipelines.

Metafacture

https://github.com/metafacture/metafacture-core/wiki

A Java-Toolkit to design transformation pipelines in Java.

like image 142
jschnasse Avatar answered Oct 24 '25 17:10

jschnasse


You could use the ontology editor Protege: There, you can SPARQL the data according to your needs and save them as TSV file. It might be important, however, to configure the software beforehand in order to make the amounts of data manageable.

like image 22
Yahalnaut Avatar answered Oct 24 '25 16:10

Yahalnaut