Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to separate out parsing from validation in case of versioned config using scala?

Background

I have a set of configuration JSON files that look like the following:

{
  "version" : 1.0,
  "startDate": 1548419535,
  "endDate": 1558419535,
  "sourceData" : [...]  // nested json inside the List.
  "destData" : [...]    // nested json inside the List.
  "extra" : ["business_type"]
}

There are several such config files. They are fixed and reside in my code directory only. The internal representation of each config file is given by my case class Config:

case class Attribute(name: String, mappedTo: String)

case class Data(location: String, mappings:List[Attribute])

case class Config(version: Double, startDate: Long, endDate: Long, sourceData: List[Data],
                  destData: List[Data], extra: List[String])

I have three classes Provider, Parser and Validator.

  1. Provider has a method getConfig(date: Long): Config. It has to return the config satisfying startDate <= date <= endDate (ideally exactly one such config should be present, as startDate to endDate defines the version of config to be returned).
  2. getConfig calls a method inside Parser called parseList(jsonConfigs: List[String]): Try[List[Config]]. What parseList does is try to deserialize all configs in the list, each to an instance of case class Config. Even if one JSON fails to deserialize parseList returns a scala.util.Failure otherwise it returns scala.util.Success[List[Config]].
  3. If scala.util.Success[List[Config]] is returned from the previous step, getConfig then finally calls a method inside Validator called def validate(List[Config], Date): ValidationResult[Config], and returns it's result. As I want all errors to be accumulated I am using Cats Validated for validation. I have even asked a question about it's correct usage here.
  4. validate does the following: Checks if exactly one Config in the List, is applicable for the given date (startDate <= date <= endDate) and then performs some validations on that Config (otherwise it returns an invalidNel). I perform some basic validations on the Config like checking various Lists and Strings being non empty etc. I also perform some semantic validations like checking that each String in field extra is present in mappings of each source/dest Data etc.

Question

  1. The question that has troubled me for couple of last days is, my purpose for using Cats Validated was solely to collect all errors (and not to fail fast when encountering the first validation error). But by the time I reach validate method I have already done some kind of validations in parseList method. That is, I have already validated inparseList that my JSON structure is in accordance to my case class Config. But my parseList doesn't accumulate errors like my validate method. So if many incompatibilities between my json structure and my case class Config are present I'll get to know only the first. But I would like to know them all at once.
  2. It gets worse if I start adding require clauses like nonEmpty inside the case class only ( they will be invoked while construction of case class, i.e. while parsing itself), e.g.

    case class Data(location: String, mappings: List[Attribute]) {
      require(location.nonEmpty)
      require(mappings.nonEmpty)
    }
    

So I am not able to draw a line between my parsing and my validation functionality properly.

  1. One solution I thought of was abandon the current JSON library (lift-json) I am using and use play-json instead. It has functionality for accumulating errors like Cats Validated (I got to know about it here, goes really well with Cats invalidNel). I thought I would first parse JSON to play-json's JSON AST JsValue, perform the structural compatible validation between JsValue and my Config using play-jsons validate method (it accumulates errors). If its fine read Config case class from JsValue and perform latter validations I gave examples of above, using Cats.
  2. But I need to parse all config to see which one is applicable for a given date. I don't proceed if even one config fails to deserialize. If all deserialize successfully I pick the one whose (startDate, endDate) enclose the given date. So if I follow the solution I mentioned above, I have pushed the conversion of List[JsValue] to List[Config] to validation phase. Now if each JsValue in the List deserializes successfully to a Config instance, I can choose the applicable one, perform more validations on it and return the result. But if some JsValue fail to deserialize what do I do? Should I return their errors? Doesn't seem intuitive. This problem here is that I need to parse all config to see which one is applicable for a given date. And this is making it more difficult for me to mark a separation between parsing and validation phase.

How do I draw a line between parsing and validating a config in my scenario? Do I change the way I maintain versions (a version being valid from start to end date)?

PS: I am an extremely novice programmer in general. Forgive me if my question is weird. I myself never thought I would spend so much time on validation while learning Scala.

like image 390
sashas Avatar asked Jan 25 '19 14:01

sashas


1 Answers

Checks if exactly one Config in the List matches 

If the behaviour described is the requirement, malformed JSON files are a validation error. You can change the Try[List[]] return type to List[Try[]] and integrate it where necessary with Validated. The documentation probably has convenient methods for working with std lib classes.

If we can take the first one that matches it's an early lunch: make the same change and just find the first one in the list that matches when looking up the config.

like image 68
rahilb Avatar answered Nov 15 '22 20:11

rahilb