Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dependent types for structured data validation

First of all, I don’t really know what’s wrong with dependent types and why we don’t see them implemented in existing languages for practical programming, instead of inventing all kind of tricks (patterns!) to bypass the limitations of current type systems which at best has very simple and limited generalization.

But my question is about Dependent types for data not a program, how or can we use them for structured data validation? Meaning, like json or xml or any kind of structured data , is it possible to verify them efficiently using some dependent type system?

Edit:

I meant by dependent types it’s most wide definition “type that depends on a value”, and not necessary those theorem prover and CoC staff. I don’t know them and I don’t want to go that road, I don’t believe those the only nor ‘the ultimate’ way to get decent dependent types. In FP, coders write their most complex logic everyday in very elegant, constructive way with all simplicity and having no problem at all. I believe they will have their ultimate “elegant” dependent typing.

However, my question was about pure Data , unlike in code when a lot of checking can be just unnecessary, and can just hiding in program flow and logic, even dynamic typing can work fine that way. In data, it’s not the case when you want to check some document correctness and give clear error messages. In another hand data does not have the complexity problem when you have to deal with “functions” in very extreme dependent type system (of CoC family).

like image 333
KA1 Avatar asked Oct 09 '11 12:10

KA1


1 Answers

You would probably be interested in this paper : The Next 700 Data Description Languages (PDF), Kathleen Fisher, Yitzhak Mandelbaum and David Walker, 2006.

The primary goal of this paper is to begin to understand the family of ad hoc data processing languages. We do so, as Landin did, by developing a semantic framework for defining, comparing, and contrasting languages in our domain. This semantic framework revolves around the definition of a data description calculus (DDC^α). This calculus uses types from a dependent type theory to describe various forms of ad hoc data: base types to describe atomic pieces of data and type constructors to describe richer structures. We show how to give a denotational semantics to DDC^α by interpreting types as parsing functions that map external representations (bits) to data structures in a typed lambda calculus. More precisely, these parsers produce both internal representations of the external data and parse descriptors that pinpoint errors in the original source.

In short : yes, dependent types are necessary if you want to statically encode fine-grained invariants about your data. Being more expressive than algebraic data types and GADTs, they also allow to express them and related constructs (such as the combination of untagged union and tagged product), having the ability to be, in some kind, the assembly language of data description, even if the user-facing specification doesn't expose term dependencies directly.

Beware, however, that such a formal approach comes at the cost of a steeper learning curve and higher upfront complexity, even if in theory it pays back with easier, safer, better specifications, manipulations tools and such. The practitioners in the field will more often than not neglect all that type system beauty, and fall back on ill-specified alternatives. XML is losing to JSON, in other reasons because specifying schemas is boring and people don't see what advantages they bring. Yes, you can later specify the static structure of the adopted JSON API (and you may well need dependent types to do that, as complexity easily crept into such evolution-rather-than-design formats), but this is only of little use if nobody cares about it, use it, understand it and, more importantly, maintain it.

(On a secondary front regarding your trollish introduction : please go forward and play with ATS, Guru or Agda, they are meant for relatively practical programming. If you want to go the frankenstein route, there is also SHE; Coq is not designed to be "practical for software development" but has been known to be abused this way -- I would not advise it for dependently typed programming, but it's good for not-very-dependent programming plus accompanying correctness proofs -- and if you want to sell your soul there is also F* coming soon.)

like image 159
gasche Avatar answered Sep 28 '22 11:09

gasche