Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Persisting various types of documents (ods, ms office, pdf) into Jackrabbit repository

I'm not sure what approach to choose for storing those types of documents because the key requirement is to gather as much metadata as possible and pdf, ods and MS office documents have various types of metadata ...

So that if the node tree has a "group/user/category/document" or "category/group/user/document" structure (I'm not sure about that what is better), each document would have to have a property "type" if it is pdf/doc/odt/ppt etc., and I would have to test for this every time, to know which metadata types it has, right ? It seems to me very ineffective..

like image 801
lisak Avatar asked Jan 21 '23 08:01

lisak


1 Answers

I personally would try to avoid structuring your hierarchy to include the file type. That would work, but it seems forced and unnatural.

Instead, I would design my hierarchy to be the most natural for your application (e.g., if you have groups and users, then maybe "group/user" and store a user's documents under the respective user node), and use properties to capture the file type and additional metadata.

If you upload a file into JCR using the "nt:file" convention, each file would be represented with a node (named according to the file's name) with a type of "nt:file". That node would then contain a single child node named "jcr:content", and convention is to use the "nt:resource" node type for this child node. In JCR 2.0, the "nt:resource" node type defines these property definitions:

  • jcr:data (BINARY) mandatory
  • jcr:lastModified (DATE) autocreated
  • jcr:lastModifiedBy (STRING) autocreated
  • jcr:mimeType (STRING) protected?
  • jcr:encoding (STRING) protected?

Note that JCR implementations are allowed to treat "jcr:mimeType" and "jcr:encoding" as protected, but neither Jackrabbit and ModeShape do this (meaning you can and must manually set these properties).

Here is a code snippet for uploading a file and setting the "jcr:mimeType" property:

// Get an input stream for the file ...
File file = ...
InputStream stream = new BufferedInputStream(new FileInputStream(file));

Node folder = session.getNode("/absolute/path/to/folder/node");
Node file = folder.addNode("Article.pdf","nt:file");
Node content = file.addNode("jcr:content","nt:resource");
Binary binary = session.getValueFactory().createBinary(stream);
content.setProperty("jcr:data",binary);
content.setProperty("jcr:mimeType","application/pdf");

Now, out of the box, the "nt:file" and "nt:resource" node types don't allow you to set properties that they don't define. But you can use mixins to get around this limitation, and store the metadata directly on these nodes. See my detailed answer describing how to do this on earlier other question.

like image 171
Randall Hauch Avatar answered Feb 15 '23 19:02

Randall Hauch