I have been trying to explore RDF Triple Store feature and Semantic Search capabilities of Marklogic 7 and then querying using SPARQL. I was able to perform some basics operations on such as: <pre class="prettyprint"><code>xquery version "1.0-ml"; import module namespace sem = "http://marklogic.com/semantics"at"/MarkLogic/semantics.xqy"; sem:rdf-insert(sem:triple(sem:iri("http://example.org/ns/people#m"), sem:iri("http://example.com/ns/person#firstName"), "Sam"),(),(),"my collection") </code></pre> which creates a triple, and then query it using the following SPARQL: <pre class="prettyprint"><code>PREFIX ab: <http://example.org/ns/people#> PREFIX ac: <http://example.com/ns/person#> SELECT ?Name WHERE { ab:m ac:firstName ?Name . } </code></pre> which retrieves Sam as result. Edited: In my use case, I have a delimited file (Structured data) having 1 Billion records that I ingested into ML using MLCP which is stored in ML for instance as: <pre class="prettyprint lang-xml prettyprint-override"><code><root> <ID>1000-000-000--000</ID> <ACCOUNT_NUM>9999</ACCOUNT_NUM> <NAME>Vronik</NAME> <ADD1>D7-701</ADD1> <ADD2>B-Valentine</ADD2> <ADD3>Street 4</ADD3> <ADD4>Fifth Avenue</ADD4> <CITY>New York</CITY> <STATE>NY</STATE> <HOMPHONE>0002600000</HOMPHONE> <BASEPHONE>12345</BASEPHONE> <CELLPHONE>54321</CELLPHONE> <EMAIL_ADDR>abc@gmail.com</EMAIL_ADDR> <CURRENT_BALANCE>10000</CURRENT_BALANCE> <OWNERSHIP>JOINT</OWNERSHIP> </root> </code></pre> Now, I want to use RDF/Semantic feature for my dataset above. However, I am not able to understand whether I need to convert the above doc to RDF as shown below (shown for <code><NAME></code>) assuming this to be a right way: <pre class="prettyprint"><code> <sem:triple> <sem:subject>unique/uri/Person </sem:subject> <sem:predicate>unique/uri/Name </sem:predicate> <sem:object datatype="http://www.w3.org/2001/XMLSchema#string" xml:lang="en">Vronik </sem:object> </sem:triple> </code></pre> and then ingest these docs in ML and search using SPARQL, or do I need to just ingest my documents and then separately ingest triples obtained from external sources and somehow (how..??) link them to my documents and then query using SPARQL? Or is there some other way that I ought to do this?

As Michael says, there are many ways you could go with this. That's because MarkLogic 7 is so flexible - you can express information as triples or as XML (or as JSON or ...) and mix'n'match data models and query languages The first thing to figure out is - what are you trying to achieve? If you just want to get your feet wet with MarkLogic's mix of XML and triples, here's what I'd suggest: <ol> <li>ingest your XML documents as above. If you have something text-heavy such as a description of the account or a free-text annotation, so much the better.</li> <li> Using XQuery or XSLT, add a triple to each document that represents the city e.g. for the sample document you posted, add --this document URI-- unique/uri/Location New York </li> <li>import triples from the web that map city names to states and zip codes (e.g. from geonames)</li> <li>now with a mixture of SPARQL and XQuery you can search for e.g. the current balance of every account in some zip code (even though your documents don't contain zip codes).</li> </ol> The documentation gives a good description of loading triples from external sources using mlcp. See http://docs.marklogic.com/guide/semantics/setup and for more detail on loading triples see http://docs.marklogic.com/guide/semantics/loading Note too that you can now run either XQuery or SPARQL (or SQL) queries directly from Query Console at <code>http://your-host:8000/qconsole/</code>

Marklogic 7: Semantic Search

Tags:

rdf

semantic-web

triplestore

marklogic

I have been trying to explore RDF Triple Store feature and Semantic Search capabilities of Marklogic 7 and then querying using SPARQL. I was able to perform some basics operations on such as:

Click to copy

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"at"/MarkLogic/semantics.xqy";
sem:rdf-insert(sem:triple(sem:iri("http://example.org/ns/people#m"),
sem:iri("http://example.com/ns/person#firstName"), "Sam"),(),(),"my collection")

which creates a triple, and then query it using the following SPARQL:

Click to copy

PREFIX ab: <http://example.org/ns/people#>
PREFIX ac: <http://example.com/ns/person#>
SELECT ?Name
WHERE
{ ab:m ac:firstName ?Name . }

which retrieves Sam as result. Edited: In my use case, I have a delimited file (Structured data) having 1 Billion records that I ingested into ML using MLCP which is stored in ML for instance as:

Click to copy

<root>
<ID>1000-000-000--000</ID>
<ACCOUNT_NUM>9999</ACCOUNT_NUM>
<NAME>Vronik</NAME>
<ADD1>D7-701</ADD1>
<ADD2>B-Valentine</ADD2>
<ADD3>Street 4</ADD3>
<ADD4>Fifth Avenue</ADD4>
<CITY>New York</CITY>
<STATE>NY</STATE>
<HOMPHONE>0002600000</HOMPHONE>
<BASEPHONE>12345</BASEPHONE>
<CELLPHONE>54321</CELLPHONE>
<EMAIL_ADDR>abc@gmail.com</EMAIL_ADDR>
<CURRENT_BALANCE>10000</CURRENT_BALANCE>
<OWNERSHIP>JOINT</OWNERSHIP>
</root>

Now, I want to use RDF/Semantic feature for my dataset above. However, I am not able to understand whether I need to convert the above doc to RDF as shown below (shown for <NAME>) assuming this to be a right way:

Click to copy

  <sem:triple>
    <sem:subject>unique/uri/Person
    </sem:subject>
    <sem:predicate>unique/uri/Name
    </sem:predicate>
    <sem:object datatype="http://www.w3.org/2001/XMLSchema#string"
    xml:lang="en">Vronik
    </sem:object>
  </sem:triple>

and then ingest these docs in ML and search using SPARQL, or do I need to just ingest my documents and then separately ingest triples obtained from external sources and somehow (how..??) link them to my documents and then query using SPARQL? Or is there some other way that I ought to do this?

354

asked Nov 19 '13 14:11

Shrey Shivam

2 Answers

It's up to you. If you want to use XML for some facts and triples for others, you can transform selected facts from XML to triples, and combine those in the same documents. For the XML you presented, that's how I'd start. As you insert or update each document in the original XML format, pass it through XQuery that adds new triples. I'd keep those new triples in the same document with the original XML.

You could do this using CPF: http://docs.marklogic.com/guide/cpf - or with a tool like http://marklogic.github.io/recordloader/ and its XccModuleContentFactory class.

But if you want to get away from the original XML format entirely, you could do that. Then you would translate your XML into triples and ingest those triples instead of the original XML. Or you can also have pure XML documents and pure triple documents in the same database.

149

answered Oct 01 '22 03:10

mblakele

As Michael says, there are many ways you could go with this. That's because MarkLogic 7 is so flexible - you can express information as triples or as XML (or as JSON or ...) and mix'n'match data models and query languages

The first thing to figure out is - what are you trying to achieve? If you just want to get your feet wet with MarkLogic's mix of XML and triples, here's what I'd suggest:

ingest your XML documents as above. If you have something text-heavy such as a description of the account or a free-text annotation, so much the better.
Using XQuery or XSLT, add a triple to each document that represents the city e.g. for the sample document you posted, add

--this document URI-- unique/uri/Location New York
import triples from the web that map city names to states and zip codes (e.g. from geonames)
now with a mixture of SPARQL and XQuery you can search for e.g. the current balance of every account in some zip code (even though your documents don't contain zip codes).

The documentation gives a good description of loading triples from external sources using mlcp.

See http://docs.marklogic.com/guide/semantics/setup

and for more detail on loading triples see http://docs.marklogic.com/guide/semantics/loading

Note too that you can now run either XQuery or SPARQL (or SQL) queries directly from Query Console at http://your-host:8000/qconsole/

answered Oct 01 '22 04:10

SBuxton

Related questions
                            
                                Clojure: No implementation of method in protocol
                            
                                Semi-automatic annotation tool - How to find RDF Triplets
                            
                                A robust search on DBpedia by title
                            
                                How to get a concise bounded description of a resource with Sesame?
                            
                                Schema.org creator vs. author property
                            
                                Extracting RDF triples from Wikidata
                            
                                Fastest algorithm for finding overlap between two very large lists?
                            
                                Filter by language only if the object is a literal
                            
                                What's the difference between skos:ConceptScheme and skos:Collection?
                            
                                Semantic Web :Tutorial on RDF dev
                            
                                Proper use of rdfs:subPropertyOf
                            
                                UnloadableImportException: Could not load imported ontology
                            
                                What are the strengths and weaknesses of a triplestore database?
                            
                                SPARQL Query with colon in predicate
                            
                                Javascript parser for RDF/JSON from WEBVTT
                            
                                Can I combine local and remote dataset within SPARQL query?
                            
                                Grouping data by month in SPARQL?
                            
                                Can't get this SPARQL query to work
                            
                                SPARQL regex filter
                            
                                How to extract Freebase Data Dump for a particular topic

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Marklogic 7: Semantic Search

Tags:

rdf

semantic-web

triplestore

marklogic

Shrey Shivam

People also ask

2 Answers

mblakele

SBuxton

Recent Activity

Donate For Us