Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python data structure validation using Validator (or something similar)

I'm dealing with data input in the form of json documents. These documents need to have a certain format, if they're not compliant, they should be ignored. I'm currently using a messy list of 'if thens' to check the format of the json document.

I have been experimenting a bit with different python json-schema libraries, which works ok, but I'm still able to submit a document with keys not described in the schema, which makes it useless to me.

This example doesn't generate an exception although I would expect it:

#!/usr/bin/python

from jsonschema import Validator
checker = Validator()
schema = {
    "type" : "object",
    "properties" : {
        "source" : {
            "type" : "object",
            "properties" : {
                "name" : {"type" : "string" }
            }
        }
    }
}
data ={
   "source":{
      "name":"blah",
      "bad_key":"This data is not allowed according to the schema."
   }
}
checker.validate(data,schema)

My question is twofold:

  • Am I overlooking something in the schema definition?
  • If not, is there another lightweight way to approach this?

Thanks,

Jay

like image 391
jay_t Avatar asked Jan 22 '12 12:01

jay_t


People also ask

Which data validation approach is best?

The best way to ensure the high data quality of your datasets is to perform up-front data validation. Check the accuracy and completeness of collected data before you add it to your data warehouse. This will increase the time you need to integrate new data sources into your data warehouse.

What is Pandera Python?

Pandera is an open-source application programming interface (API) in python. It is a flexible and expressive API for falsification so that a coherent and robust data pipeline could be built.


1 Answers

Add "additionalProperties": False:

#!/usr/bin/python

from jsonschema import Validator
checker = Validator()
schema = {
    "type" : "object",
    "properties" : {
        "source" : {
            "type" : "object",
            "properties" : {
                "name" : {"type" : "string" }
            },
            "additionalProperties": False, # add this
        }
    }
}
data ={
   "source":{
      "name":"blah",
      "bad_key":"This data is not allowed according to the schema."
   }
}
checker.validate(data,schema)
like image 60
Rob Wouters Avatar answered Oct 14 '22 15:10

Rob Wouters