Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert XML document stored as binary to XML?

Tags:

marklogic

I saved a bunch of content in MarkLogic as binary format documents instead of XML. When I decode the document, it's XML. The side-effect of this error is that my searches don't include those documents.

Is there a way to convert the format of a document in-situ? If not, is there a way to do some kind of mass conversion? Any other ideas on how I can resolve this?

I know how to list all the URIs for binary documents:

xquery version "1.0-ml";
declare namespace qry  = "http://marklogic.com/cts/query";
let $binary-term :=
  xdmp:plan(/binary())//qry:term-query/qry:key/text()
let $binary_uris := cts:uris((), (), cts:term-query($binary-term))
return $binary_uris

and I know how to decode the documents:

xdmp:binary-decode(fn:doc($uri)/node(), "UTF-8")

but what I don't know is what to do after that. I can loop over that list of $binary_uris and decode them, but how do I take that result and overwrite the existing document in a batch process?

like image 752
Steve Anderson Avatar asked Jan 21 '26 05:01

Steve Anderson


1 Answers

Depending upon how your docs were saved as binary() nodes, you might be able to used xdmp:quote() and then xdmp:unquote().

Below is a quick proof of concept that shows how content that was saved as binary can be turned back into either text or XML:

xquery version "1.0-ml";
xdmp:document-insert("/test.xml", 
  binary{ xs:hexBinary(xs:base64Binary(xdmp:base64-encode(xdmp:quote(<doc>test</doc>))))}),
xdmp:document-insert("/test.txt", 
  binary{ xs:hexBinary(xs:base64Binary(xdmp:base64-encode(xdmp:quote("test" ))))})
;
for $ext in ("xml", "txt")
let $doc := doc("/test." || $ext)
where $doc/node() instance of binary() 
      (: you could also restrict to docs who's URIs end with .xml, .txt, etc :)
return
  let $doc-text := xdmp:quote($doc)
  let $doc-decoded :=
    if (fn:starts-with($doc-text, "&lt;")) 
    then xdmp:unquote($doc-text)
    else $doc-text 
  return
    $doc-decoded
;
xdmp:document-delete("/test.xml"),
xdmp:document-delete("/test.txt")

If you wanted to "fix" the documents, you could then use xdmp:node-replace() to replace the binary() node with the decoded document:

xdmp:node-replace($doc/node(), $doc-decoded)

You could run a batch job, using the MarkLogic Java DMSDK or a CORB job to select those docs and re-save them.

like image 71
Mads Hansen Avatar answered Jan 23 '26 21:01

Mads Hansen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!