Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate a JSON object against a JSON schema based on object's type described by a field?

Tags:

jsonschema

I have a number of objects (messages) that I need to validate against a JSON schema (draft-04). Each objects is guaranteed to have a "type" field, which describes its type, but every type have a completely different set of other fields, so each type of object needs a unique schema.

I see several possibilities, none of which are particularly appealing, but I hope I'm missing something.

Possibility 1: Use oneOf for each message type. I guess this would work, but the problem is very long validation errors in case something goes wrong: validators tend to report every schema that failed, which include ALL elements in "oneOf" array.

{
  "oneOf":
  [
    {
      "type": "object",
      "properties":
      {
        "t":
        {
          "type": "string",
          "enum":
          [
            "message_type_1"
          ]
        }
      }
    },
    {
      "type": "object",
      "properties":
      {
        "t":
        {
          "type": "string",
          "enum":
          [
            "message_type_2"
          ]
        },
        "some_other_property":
        {
          "type": "integer"
        }
      },
      "required":
      [
        "some_other_property"
      ]
    }
  ]
}

Possibility 2: Nested "if", "then", "else" triads. I haven't tried it, but I guess that maybe errors would be better in this case. However, it's very cumbersome to write, as nested if's pile up.

Possibility 3: A separate scheme for every possible value of "t". This is the simplest solution, however I dislike it, because it precludes me from using common elements in schemas (via references).

So, are these my only options, or can I do better?

like image 614
MaxEd Avatar asked Apr 13 '18 18:04

MaxEd


People also ask

How do you validate a JSON file against a schema?

The simplest way to check if JSON is valid is to load the JSON into a JObject or JArray and then use the IsValid(JToken, JsonSchema) method with the JSON Schema. To get validation error messages, use the IsValid(JToken, JsonSchema, IList<String> ) or Validate(JToken, JsonSchema, ValidationEventHandler) overloads.

Can we validate JSON with schema?

JSON Schema is a powerful tool. It enables you to validate your JSON structure and make sure it meets the required API. You can create a schema as complex and nested as you need, all you need are the requirements. You can add it to your code as an additional test or in run-time.

What is JSON Schema validator?

JSON Schema Validation: The JSON Schema Validation specification is the document that defines the valid ways to define validation constraints. This document also defines a set of keywords that can be used to specify validations for a JSON API.


1 Answers

Since "type" is a JSON Schema keyword, I'll follow your lead and use "t" as the type-discrimination field, for clarity.

There's no particular keyword to accomplish or indicate this (however, see https://github.com/json-schema-org/json-schema-spec/issues/31 for discussion). This is because, for the purposes of validation, everything you need to do is already possible. Errors are secondary to validation in JSON Schema. All we're trying to do is limit how many errors we see, since it's obvious there's a point where errors are no longer productive.

Normally when you're validating a message, you know its type first, then you read the rest of the message. For example in HTTP, if you're reading a line that starts with Date: and the next character isn't a number or letter, you can emit an error right away (e.g. "Unexpected tilde, expected a month name").

However in JSON, this isn't true, since properties are unordered, and you might not encounter the "t" until the very end, if at all. "if/then" can help with this.

But first, begin by by factoring out the most important constraints, and moving them to the top.

First, use "type": "object" and "required":["t"] in your top level schema, since that's true in all cases.

Second, use "properties" and "enum" to enumerate all its valid values. This way if "t" really is entered wrong, it will be an error out of your top-level schema, instead of a subschema.

If all of these constraints pass, but the document is still invalid, then it's easier to conclude the problem must be with the other contents of the message, and not the "t" property itself.

Now in each sub-schema, use "const" to match the subschema to the type-name.

We get a schema like this:

{
  "type": "object",
  "required": ["t"],
  "properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
  "oneOf": [
     {
        "type": "object",
        "properties": {
          "t": { "const": "message_type_1" }
        }
     },
     {
        "type": "object",
        "properties": 
          "t": { "const": "message_type_2" },
          "some_other_property": {
             "type": "integer"
          }
        },
        "required": [ "some_other_property" ]
     }
  ]
}

Now, split out each type into a different schema file. Make it human-accessible by naming the file after the "t". This way, an application can read a stream of objects and pick the schema to validate each object against.

{
  "type": "object",
  "required": ["t"],
  "properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
  "oneOf": [
     {"$ref": "message_type_1.json"},
     {"$ref": "message_type_2.json"}
  ]
}

Theoretically, a validator now has enough information to produce much cleaner errors (though I'm not aware of any validators that can do this).

So, if this doesn't produce clean enough error reporting for you, you have two options:

First, you can implement part of the validation process yourself. As described above, use a streaming JSON parser like Oboe.js to read each object in a stream, parse the object and read the "t" property, then apply the appropriate schema.

Or second, if you really want to do this purely in JSON Schema, use "if/then" statements inside "allOf":

{
  "type": "object",
  "required": ["t"],
  "properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
  "allOf": [
     {"if":{"properties":{"t":{"const":"message_type_1"}}}, "then":{"$ref": "message_type_1.json"}},
     {"if":{"properties":{"t":{"const":"message_type_2"}}}, "then":{"$ref": "message_type_2.json"}}
  ]
}

This should produce errors to the effect of:

t not one of "message_type_1" or "message_type_2"

or

(because t="message_type_2") some_other_property not an integer

and not both.

like image 181
awwright Avatar answered Sep 22 '22 10:09

awwright