I have the following challenge. We have csv files that we want to load into MarkLogic database using mlcp. We also want to transform the loaded rows during the load into OBI sources, so we buils a transform function for that.
Now I am struggling with the transform. Without the transform the data loads as a doc per row as expected.
csv example:
voornaam,achternaam
hugo,koopmans
thijs,van ulden
transform-ambulance.xqy:
xquery version "1.0-ml";
module namespace rws = "http://marklogic.com/rws";
import module namespace source = "http://marklogic.com/solutions/obi/source" at "/ext/obi/lib/source-lib.xqy";
(: If the input document is XML, create an OBI source from it, with the value
: specified in the input parameter. If the input document is not
: XML, leave it as-is.
:)
declare function rws:transform(
$content as map:map,
$context as map:map
) as map:map*
{
let $attr-value :=
(map:get($context, "transform_param"), "UNDEFINED")[1]
let $the-doc := map:get($content, "value")
return
if (fn:empty($the-doc/element()))
then $content
else
let $root := xdmp:unquote($the-doc/*)
let $source-title := "ambulance source data"
let $collection := 'ambulance'
let $source-id := source:create-source($source-title, (),$root)
let $_ := xdmp:document-add-collections(concat("/marklogic.solutions.obi/source/", $source-id[1],".xml"), $collection)
return (
map:put($content, "value",
$source-id[2]
), $content
)
};
mlcp command:
mlcp.sh import \
-host localhost \
-port 27041 \
-username admin \
-password admin \
-input_file_path ./sampledata/so-example.csv \
-input_file_type delimited_text \
-transform_module /transforms/transform-ambulance.xqy \
-transform_namespace "http://marklogic.com/rws" \
-mode local
mlcp output:
15/09/08 21:35:08 INFO contentpump.ContentPump: Hadoop library version: 2.6.0
15/09/08 21:35:08 INFO contentpump.LocalJobRunner: Content type: XML
15/09/08 21:35:08 INFO input.FileInputFormat: Total input paths to process : 1
15/09/08 21:35:10 WARN mapreduce.ContentWriter: XDMP-DOCROOTTEXT: xdmp:unquote(document{<root><voornaam>hugo</voornaam><achternaam>koopmans</achternaam></root>}) -- Invalid root text "hugokoopmans" at line 1
15/09/08 21:35:10 WARN mapreduce.ContentWriter: XDMP-DOCROOTTEXT: xdmp:unquote(document{<root><voornaam>thijs</voornaam><achternaam>van ulden</achternaam></root>}) -- Invalid root text "thijsvan ulden" at line 1
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: completed 100%
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: com.marklogic.contentpump.ContentPumpStats:
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: ATTEMPTED_INPUT_RECORD_COUNT: 2
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: SKIPPED_INPUT_RECORD_COUNT: 0
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: Total execution time: 2 sec
I have tried without the xdmp:unquote() but then I hit a coercion document-node() error...
Please advice...
ok so the issue was that we needed to cast the $root variable as document-node()...
let $root := document {$the-doc/root}
solves the issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With