Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Yet another newtype vs. data (stylistic issue)

I know the differences between data, newtype and type very well. I am writing a small script that will build some sort of syntax tree. Almost all types have one constructor. I am avoiding type to enforce safety (multiple "different" types might end up having the same type in Haskell). I don't care about laziness/strictness in this case, nor do I care about performance (this part is by no means performance critical). I am mainly focused on style. I have three options:

  1. Use only data. This feels OK, except that I have many types with only one constructor with one argument. The code looks some how wasteful... Although I don't care about the performance gain, but it just does not feel right.
  2. Use only newtype. This leads to a lot of ugliness with tuples in the case of multiple parameters.
  3. Mix data and newtype which somewhat look non-uniform and slightly annoying.. I'd rather have all types declared in a single consistent way.

I am in a dilemma of choosing between 1 and 3.

like image 583
aelguindy Avatar asked Feb 15 '12 12:02

aelguindy


2 Answers

In this case, I would use data universally, for a couple of reasons. Firstly, for consistency with the multiple-argument cases (which should definitely be data, not newtype).

Secondly, and most importantly, newtype has different semantics to data! The constructor of a newtype is strict, as opposed to those of data, which are non-strict unless you explicitly use strict fields. Even if you don't care about strictness, or all the fields of your datas are strict, there are still some subtle differences.

I don't think one-constructor, one-argument data types are wasteful — syntactically, they're just as light as a newtype, and semantically, seems more important to me.

You said you're not concerned about performance, but if the runtime boxing overhead of a data was really inconvenient, then you could mix them, as long as you're aware of the semantic differences. However, if you use -funbox-strict-fields, then GHC might be able to optimise away the single-constructor, single-argument datas for you, if they occur as strict fields in other data types.

Generally, you should use newtype when you're wrapping an existing type, for the purposes of compile-time safety/abstraction, or to define your own instances, and use data whenever the type just happens to be composed of a single field, rather than being a wrapper.

like image 196
ehird Avatar answered Sep 19 '22 15:09

ehird


When I am building real programs that aren't doing subtle things with laziness, I almost always use newtype for data types with a single constructor and argument and data for everything else:

data Foo = FooA | FooB Int
data Bar = BarA Int Foo
newtype Baz = Baz Bar

At the very least, if you find yourself writing

newtype Foo = Foo (X,Y)

the semantics are identical to

data Foo = Foo X Y

so you might as well use the data version because it's prettier. Indeed

data Foo = Foo Int
newtype Bar = Bar Int

do differ in semantics, but not in any way that ends up being important for "real" programs, where we don't expect to have to know the difference between _|_ and Foo _|_ (because all values are fully-defined anyway).

There is another thing to look at: uniformity in declarations is something to be wary of. It indicates that there is a level of abstraction that you are not encoding in your program, that you are leaving implicit. See if you can encode that level until there is no parallel declaration structure left to exploit. This is not always possible, but try to get close.

like image 27
luqui Avatar answered Sep 22 '22 15:09

luqui