I have been trying to explore RDF Triple Store feature and Semantic Search capabilities of Marklogic 7 and then querying using SPARQL. I was able to perform some basics operations on such as:
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"at"/MarkLogic/semantics.xqy";
sem:rdf-insert(sem:triple(sem:iri("http://example.org/ns/people#m"),
sem:iri("http://example.com/ns/person#firstName"), "Sam"),(),(),"my collection")
which creates a triple, and then query it using the following SPARQL:
PREFIX ab: <http://example.org/ns/people#>
PREFIX ac: <http://example.com/ns/person#>
SELECT ?Name
WHERE
{ ab:m ac:firstName ?Name . }
which retrieves Sam as result. Edited: In my use case, I have a delimited file (Structured data) having 1 Billion records that I ingested into ML using MLCP which is stored in ML for instance as:
<root>
<ID>1000-000-000--000</ID>
<ACCOUNT_NUM>9999</ACCOUNT_NUM>
<NAME>Vronik</NAME>
<ADD1>D7-701</ADD1>
<ADD2>B-Valentine</ADD2>
<ADD3>Street 4</ADD3>
<ADD4>Fifth Avenue</ADD4>
<CITY>New York</CITY>
<STATE>NY</STATE>
<HOMPHONE>0002600000</HOMPHONE>
<BASEPHONE>12345</BASEPHONE>
<CELLPHONE>54321</CELLPHONE>
<EMAIL_ADDR>[email protected]</EMAIL_ADDR>
<CURRENT_BALANCE>10000</CURRENT_BALANCE>
<OWNERSHIP>JOINT</OWNERSHIP>
</root>
Now, I want to use RDF/Semantic feature for my dataset above.
However, I am not able to understand whether I need to convert the above doc to RDF as shown below (shown for <NAME>
) assuming this to be a right way:
<sem:triple>
<sem:subject>unique/uri/Person
</sem:subject>
<sem:predicate>unique/uri/Name
</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string"
xml:lang="en">Vronik
</sem:object>
</sem:triple>
and then ingest these docs in ML and search using SPARQL, or do I need to just ingest my documents and then separately ingest triples obtained from external sources and somehow (how..??) link them to my documents and then query using SPARQL? Or is there some other way that I ought to do this?
Each document can contain multiple triples. The setting for the number of triples stored in documents is defined by MarkLogic Server and is not a user configuration. Ingested triples are indexed with the triples index to provide access and the ability to query the triples with SPARQL, XQuery, or a combination of both.
MarkLogic Server is designed to securely store and manage a variety of data to run transactional, operational, and analytical applications.
MarkLogic Semantics acts as the glue for master data, providing an ideal model for reference data and metadata (provenance, lineage, etc.). MarkLogic stores entity data such as Customers and Orders as documents, and can store the relationships between those entities as RDF Triples.
As a multi-model database, MarkLogic combines the benefits of a document store and an RDF Triple Store. This approach is ideal for integrating and accessing all of your data. JSON and XML documents provide incredible flexibility for modeling entities, while RDF triples — the data format for semantic graph data — are ideal for storing relationships.
MarkLogic includes rich full-text search features. All of the search features are implemented as extension functions available in XQuery, and most of them are also available through the REST and Java interfaces. This section provides a brief overview some of the main search features in MarkLogic and includes the following parts:
On top of the REST API are the Java and Node.js Client APIs that enable users familiar with those interfaces access to the MarkLogic search features. The following diagram illustrates the layering of the Java, Node.js, REST, XQuery ( search and cts ), and JavaScript APIs.
It's up to you. If you want to use XML for some facts and triples for others, you can transform selected facts from XML to triples, and combine those in the same documents. For the XML you presented, that's how I'd start. As you insert or update each document in the original XML format, pass it through XQuery that adds new triples. I'd keep those new triples in the same document with the original XML.
You could do this using CPF: http://docs.marklogic.com/guide/cpf - or with a tool like http://marklogic.github.io/recordloader/ and its XccModuleContentFactory
class.
But if you want to get away from the original XML format entirely, you could do that. Then you would translate your XML into triples and ingest those triples instead of the original XML. Or you can also have pure XML documents and pure triple documents in the same database.
As Michael says, there are many ways you could go with this. That's because MarkLogic 7 is so flexible - you can express information as triples or as XML (or as JSON or ...) and mix'n'match data models and query languages
The first thing to figure out is - what are you trying to achieve? If you just want to get your feet wet with MarkLogic's mix of XML and triples, here's what I'd suggest:
ingest your XML documents as above. If you have something text-heavy such as a description of the account or a free-text annotation, so much the better.
Using XQuery or XSLT, add a triple to each document that represents the city e.g. for the sample document you posted, add
--this document URI--
unique/uri/Location
New York
import triples from the web that map city names to states and zip codes (e.g. from geonames)
now with a mixture of SPARQL and XQuery you can search for e.g. the current balance of every account in some zip code (even though your documents don't contain zip codes).
The documentation gives a good description of loading triples from external sources using mlcp.
See http://docs.marklogic.com/guide/semantics/setup
and for more detail on loading triples see http://docs.marklogic.com/guide/semantics/loading
Note too that you can now run either XQuery or SPARQL (or SQL) queries directly from Query Console at http://your-host:8000/qconsole/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With