Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Work with JSON schema in CouchDB

I would ask about good practicies about JSON schematics in CouchDB. I use pure CouchDB 1.6.1 at this moment. I handle it without any couchapp framework ( I know this is usefull, but I am concerned about it will be functional in future ).

  • Where put schema in CouchDB ? As regular document? Design document ? Or maybe store them as file ? But if I would validate them, especially server-side in validate_doc_update function, they should be stored in design documents.

  • Is there any library (JavaScript will be best) with works in CouchDB and Client (Web browser) ? Library with I could generate JSONs and validate them automatically ?

  • I think about how to send data to client, store them in input tags, and then collect somehow and send to serwer. Maybe set input id as path to field, in example:

    { "Adress" :{ "Street" : "xxx", "Nr" : "33" } }

In that case input could have id = "Adress."Street", but I do not know is this good solution. I should send schema from server and build JSON object using this schema, but no idea how (in case that all fields in JSON had unique names - including hierarchies).

like image 321
InnerWorld Avatar asked Dec 19 '14 09:12

InnerWorld


People also ask

What can you do with JSON Schema?

The primary strength of JSON Schema is that it generates clear, human- and machine-readable documentation. It's easy to accurately describe the structure of data in a way that developers can use for automated validation. This makes work easier for developers and testers, but the benefits go beyond productivity.

Does JSON support schema?

JSON has a schema. REST services have WADL. Also there are tools like wadl2java . Old question, but worth clarifying: The JSON Schema standard includes "hyper-schemas", which specify links/actions - including HTTP method, required data (specified as JSON Schema), and expected results.

What is a design document in CouchDB?

Design documents are basically just like any other document within a CouchDB database; that is, a JSON structure that you can create and update using the same PUT and POST HTTP operations.

How does JSON Schema match?

The simplest way to check if JSON is valid is to load the JSON into a JObject or JArray and then use the IsValid(JToken, JsonSchema) method with the JSON Schema. To get validation error messages, use the IsValid(JToken, JsonSchema, IList<String> ) or Validate(JToken, JsonSchema, ValidationEventHandler) overloads.


2 Answers

You ask the same question I had for years while exploring the potential advantages of CouchDB in Forms-over-Data use-cases.

Initially my hope was to find an approach that enables data validation based on the same JSON schema definition and validation code - server- and client-side. It has turned out that it is not only possible but also some additionally advantages existing.

Where put schema in CouchDB ? As regular document? Design document ? Or maybe store them as file ? But if I would validate them, especially server-side in validate_doc_update function, they should be stored in design documents.

You are right. The design doc (ddoc) that also includes the validate_doc_update function to execute the validation before the doc update is the most common place to put the schemata in. this in the validate_doc_update function is the ddoc itself - everything included in the ddoc can be accessed from the validation code.

I had began to store schemata as JSON object in my general library property/folder for commonjs modules e.g. lib/schemata.json. The type property of my docs specified the key of the schema that the doc update validation should fetch e.g. type: 'adr' -> lib/schemata/adr. A schema could also refer to other schemata per property - a recursive validation function has traversed to the end of any property no matter from what type the nested properties were. It has worked well in the first project.

{
  "person": {
    "name": "/type/name",
    "adr": "/type/adr",
    ...
  },
  "name": {
    "forname": {
      "minlenght": 2,
      "maxlength": 42,
      ...
    },
    "surname": {
      ...
    }
  },
  "adr": {
    ...
  }
}

But then i wanted to use a subset of that schemata in another project. To simply copy it over and to add/remove some schemata would have been too short-sighted thinking. What if a general schema like for an address have a bug and needs to be updated in every project it is used?

At this point my schemata were stored in one file in the repository (i use erica as upload tool for ddocs). Then I realized that when I store every schema in a separated file e.g. adr.json, geo.json, tel.json etc. it results in the same JSON-structure in the servers ddoc as before with the single file approach. But it was more suitable for source code management. Not only that smaller files result in lesser merge conflicts and a cleaner commit history - also schemata dependency management via sub-repositories (submodules) was enabled.

Another thought was to use CouchDB itself as schemata storage and management place. But as you have mentioned it by yourself - the schemata have to be accessible in the validate_doc_update function. First I tried an approach with an update handler - every doc update have to pass a validation update handler that fetches the right schema from the CouchDB by itself:

POST /_design/validator/_update/doctype/person

function (schema, req) {
   ... //validate req.body against schema "person"
  return [req.body, {code: 202, headers: ...}]
}

But that approach doesn't works well with nested schemata. Even worse - for preventing doc updates without the validation through the handler I had to use a proxy in front of CouchDB to hide the direct built-in doc update paths (e.g. POST to/the/doc/_id). I didn't found a way to detect in the validate_doc_update function whether the update handler was involved before or not (Maybe someone else has? I would be glad to read such an solution.).

During that investigation the problem of different versions of the same schema shows up on my radar. How should I manage that? Must all docs from the same type be valid against the same schema version (what means to need a db-wide data migration before nearly every schema version change)? Should the type property also include a version number? etc.

But wait! What if the schema of a document is attached to the document itself? It:

  • will provide the compatible version to the doc contents per doc
  • be accessible in the validate_doc_update function (in oldDoc)
  • can be replicated without administrator access rights (as you need for ddoc updates)
  • will be included in every response for a client-side doc request

That sounded very interesting and it feels to me like the most CouchDB-ish approach until now. To say it clearly - the schema of a document is attached to the document itself - means to store it in a property of the doc. Both the storage as attachment and the usage of the schema itself as doc structure were not successfully.

The most sensitive moment of that approach is the C (create) in the CRUD life-circle of a doc. There are many different solutions imaginable to ensure that the attached schema is "correct and acceptable". But it depends on your definition of that terms in your particular project.

Is there any library (JavaScript will be best) with works in CouchDB and Client (Web browser) ? Library with I could generate JSONs and validate them automatically ?

I had began to implement with the popular JQuery Validation plugin. I could use the schema as configuration and got neat client-side validation automatically. At the server-side I have extracted the validation functions as commonjs module. I expected to find a modular way for code management later that prevents code duplication.

It has turned out that most of the existing validation frameworks are very good in pattern matching and single-property-validations but not capable to validate against depend values in the same document. Also the schema definition requirements are often too proprietary. For me the rule of thumb for choosing the right schema definition is: prefer a standardized definition (jsonschema.org, microdata, rdfa, hcard etc.) over own implementation. If you leave the structure and property names as-they-are you will need less documentation, less transformation and sometimes you get compatibility to foreign software your users use too (e.g. calendars, address books etc.) automatically. If you want to implement a HTML presentation for your docs you are well prepared to do it in semantic web-ish and SEO-zed way.

And finally - without wishing to sound arrogant - to write a schema validation implementation is not difficult. Maybe you want to read the source code of the JQuery Validation Plugin - i'm sure you find that like me surprising comprehensible. In times where the churn rate of front-end frameworks is increasing it maybe is the most future-proof way to have an own validation function. Also I believe you should have a 100% understanding of the validation implementation - it is a critical part of your application. And if you understand a foreign implementation - you can also write the library by yourself.

Ok. That is a loooong answer. Sorry. If someone reads this to the end and want to see it detailed in action with example source code - upvote and I will write a blog post and append the URI as comment.

like image 121
Ingo Radatz Avatar answered Oct 20 '22 23:10

Ingo Radatz


I'll tell you, how I'm implementing it.

  1. I have a database per document type, which allows me to implement one schema per database.

  2. On each database I have a _design/schema ddoc which contains a schema and validate_doc_update function to validate it.

  3. I'm using Tiny Validator (for v4 JSON Schema), which I include right into _design/schema ddoc.


_design/schema ddoc looks like this:

{
  "_id": "_design/schema",
  "libs": {
    "tv4": // Code from https://raw.githubusercontent.com/geraintluff/tv4/master/tv4.min.js
  },
  "validate_doc_update": "..."
  "schema": {
    "title": "Blog",
    "description": "A document containing a single blog post.",
    "type": "object",
    "required": ["title", "body"],
    "properties": {
      "_id": {
        "type": "string"
      },
      "_rev": {
        "type": "string"
      },
      "title": {
        "type": "string"
      },
      "body": {
        "type": "string"
      }
    }
  }
}

validate_doc_update function looks like this:

function(newDoc) {
  if (newDoc['_deleted']) return;

  var tv4 = require('libs/tv4');

  if (!tv4.validate(newDoc, this.schema)) {
    throw({forbidden: tv4.error.message + ' -> ' + tv4.error.dataPath});
  }
}

Hope this helps.

like image 35
mrded Avatar answered Oct 20 '22 23:10

mrded