I have a file which has a structure, but I don't know what format it is, nor how to parse it. The file extension is ttl, but I have never encountered this before.
Some lines from the file looks like this:
<http://data.europa.eu/esco/label/790ff9ed-c43b-435c-b6b3-6a4a6e8e8326>
a skosxl:Label ;
skosxl:literalForm "gérer des opérations d’allègement"@fr .
<http://data.europa.eu/esco/label/98570af6-b237-4cdd-b555-98fe3de26ef8>
a skosxl:Label ;
esco:hasLabelRole <http://data.europa.eu/esco/label-role/neutral> , <http://data.europa.eu/esco/label-role/male> , <http://data.europa.eu/esco/label-role/female> ;
skosxl:literalForm "particleboard machine technician"@en .
<http://data.europa.eu/esco/label/aaac5531-fc8d-40d5-bfb8-fc9ba741ac21>
a skosxl:Label ;
esco:hasLabelRole "http://data.europa.eu/esco/label-role/female" , "http://data.europa.eu/esco/label-role/standard-female" ;
skosxl:literalForm "pracovnice denní péče o děti"@cs .
And it goes on like this for 400 more MB. Additional attributes are added, for some, but not all nodes.
It reminds me of some form of XML, but I don't have much experience working with different formats. It also looks like something that can be modeles as a graph. Do you have any idea what data format it is, and how I could parse it in python?
Yes, @Phil is correct that is turtle syntax for storing RDF data.
I would suggest you import it into an RDF store of some sort rather than try and parse 400MB+ yourself. You can use GraphDB, Blazegraph, Virtuso and the list goes on. A search for RDF stores should give many other options.
Then you can use SPARQL to query the RDF store (which is like SQL for relational databases) using Python RDFlib. Here is an example from RDFLib.
That looks like turtle - a data description language for the semantic web.
The :has label and :label are specified for two different semantic libraries defined to share data (esco and skosxl there should not be much problem finding these libraries with a search engine, assuming the data is in the semantic web) . :literal form could be thought of as the value in an XML tag.
They represent ontologies in a data structure:
Subject : 10 Predicate : Name Object : John
As for python, read the data as a file, use the subject as the keys of a dictionary, put the values in a database, its unclear what you want to do with the data.
Semantic data is open, incomplete and could have an unusual, complex structure. The example above is very simple the primer linked above may help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With