I saved a bunch of content in MarkLogic as binary format documents instead of XML. When I decode the document, it's XML. The side-effect of this error is that my searches don't include those documents.
Is there a way to convert the format of a document in-situ? If not, is there a way to do some kind of mass conversion? Any other ideas on how I can resolve this?
I know how to list all the URIs for binary documents:
xquery version "1.0-ml";
declare namespace qry = "http://marklogic.com/cts/query";
let $binary-term :=
xdmp:plan(/binary())//qry:term-query/qry:key/text()
let $binary_uris := cts:uris((), (), cts:term-query($binary-term))
return $binary_uris
and I know how to decode the documents:
xdmp:binary-decode(fn:doc($uri)/node(), "UTF-8")
but what I don't know is what to do after that. I can loop over that list of $binary_uris and decode them, but how do I take that result and overwrite the existing document in a batch process?
Depending upon how your docs were saved as binary() nodes, you might be able to used xdmp:quote() and then xdmp:unquote().
Below is a quick proof of concept that shows how content that was saved as binary can be turned back into either text or XML:
xquery version "1.0-ml";
xdmp:document-insert("/test.xml",
binary{ xs:hexBinary(xs:base64Binary(xdmp:base64-encode(xdmp:quote(<doc>test</doc>))))}),
xdmp:document-insert("/test.txt",
binary{ xs:hexBinary(xs:base64Binary(xdmp:base64-encode(xdmp:quote("test" ))))})
;
for $ext in ("xml", "txt")
let $doc := doc("/test." || $ext)
where $doc/node() instance of binary()
(: you could also restrict to docs who's URIs end with .xml, .txt, etc :)
return
let $doc-text := xdmp:quote($doc)
let $doc-decoded :=
if (fn:starts-with($doc-text, "<"))
then xdmp:unquote($doc-text)
else $doc-text
return
$doc-decoded
;
xdmp:document-delete("/test.xml"),
xdmp:document-delete("/test.txt")
If you wanted to "fix" the documents, you could then use xdmp:node-replace() to replace the binary() node with the decoded document:
xdmp:node-replace($doc/node(), $doc-decoded)
You could run a batch job, using the MarkLogic Java DMSDK or a CORB job to select those docs and re-save them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With