Linked data collections are usually given in RDF/XML, JSON-LD, or TTL format. Relatively large data dumps seem fairly difficult to process. What is a good way to convert an RDF/XML file to a TSV of triplets of linked data?
I've tried OpenRefine, which should handle this, but a 10GB file, (e.g. the person authority information from German National Library) is too difficult to process on a laptop with decent processing power.
Looking for software recommendations or some e.g. Python/R code to convert it. Thanks!
Try these:
Lobid GND API
http://lobid.org/gnd/api
Supports OpenRefine (see blogpost) and a variety of other queries. The data is hosted as JSON-LD (see context) in an elasticsearch cluster. The service offers a rich HTTP-API.
Use a Triple Store
Load the data to a triple store of your choice, e.g. rdf4j. Many triple stores provide some sort of CSV serialization. Together with SPARQL this could be worth a try.
Catmandu
http://librecat.org/Catmandu/
A strong perl based data toolkit that comes with a useful collection of ready-to-use transformation pipelines.
Metafacture
https://github.com/metafacture/metafacture-core/wiki
A Java-Toolkit to design transformation pipelines in Java.
You could use the ontology editor Protege: There, you can SPARQL the data according to your needs and save them as TSV file. It might be important, however, to configure the software beforehand in order to make the amounts of data manageable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With