Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What (exactly) are "First Class" modules?

I often read some programming languages have "First Class" support for modules (OCaml, Scala, TypeScript[?]) and recently stumbled upon an answer on SO citing modules as first class citizens among the distinguishing features of Scala.

I thought I knew very well what Modular Programming means but after these incidents I'm beginning to doubt my understanding...

I think modules are nothing special but instances of certain classes that are acting as mini-libraries. The mini-library code goes into a class, objects of that class are the modules. You can pass them around as dependencies to any other class that requires the services provided by the module, so any decent OOPL has first class modules but apparently not!

  1. What exactly is a module? How is it different than, say, a plain class or an object?
  2. How is (1) related (or not) to the Modular Programming that we all know?
  3. What exactly does it mean for a language to have first class modules? What are the benefits? what are the drawbacks if a languages lacks such feature?
like image 817
Ashkan Kh. Nazary Avatar asked Jun 13 '19 07:06

Ashkan Kh. Nazary


People also ask

Is a module the same as a class?

What is the difference between a class and a module? Modules are collections of methods and constants. They cannot generate instances. Classes may generate instances (objects), and have per-instance state (instance variables).

What are Functors in OCaml?

A functor is a module that is parametrized by another module, just like a function is a value which is parametrized by other values, the arguments. It allows one to parametrize a type by a value, which is not possible directly in OCaml without functors.


Video Answer


4 Answers

A module, as well as a subroutine, is a way of organizing your code. When we develop programs, we pack instructions into subroutines, subroutines into structures, structures into packages, libraries, assemblies, frameworks, solutions, and so on. So, putting everything else aside, it is just a mechanism to organize your code.

The essential reason, why we use all those mechanisms, instead of just laying out our instructions linearly, is because the complexity of a program grows non-linearly with respect to its size. In other words, a program built from n pieces each having m instructions is easier to comprehend than a program which is built from n*m instructions. This is, of course, not always true (otherwise we can just split our program into arbitrary parts and be happy). In fact, for that to be true, we have to introduce one essential mechanism called abstraction. We can benefit from splitting a program into manageable subparts only if each part provides some sort of abstraction. For example, we can have, connect_to_database, query_for_students, sort_by_grade, and take_the_first_n abstractions packed as functions or subroutines, and is much easier to understand the code which is expressed in terms of those abstractions, rather than trying to understand the code in which all those functions are inlined.

So now we have functions and it is natural to introduce the next level of organization -- collections of functions. It is common to see that some functions build families around some common abstraction, e.g., student_name, student_grade, student_courses, etc, they all revolve around the same abstraction student. The same is for connection_establish, connection_close, etc. Therefore we need some mechanism that will tie together those functions. Here we are starting to have options. Some languages took the OOP path, in which objects and classes are the units of the organization. Where a bunch of functions and a state is called an object. Other languages took a different path and decided to combine functions into static structures called modules. The main difference is that a module is a static, compile-time structure, where objects are runtime structures that have to be created in runtime to be used. As a result, naturally, objects tend to contain state, while modules do not (and contain only code). And objects are inherently regular values, which you can assign to variables, store them in files, and do other manipulations which you can do with data. Classical modules, contrary to objects, do not have runtime representation, therefore you can't pass modules as parameters to your functions, store them in a list, and otherwise perform any computations on modules. This is basically what people mean by saying first class citizen - an ability to treat an entity as a simple value.

Back to composable programs. In order to make objects/modules composable, we need to be sure that they create abstractions. For functions abstraction boundary is clearly defined - it is the tuple of parameters. For objects, we have a notion of interfaces and classes. While for modules we have only interfaces. Since modules are inherently more simple (they do not include the state) we do not have to deal with their constructing and deconstructing, therefore we do not need a more complicated notion of a class. Both classes and interfaces is a way to classify objects and modules by some criteria so that we can reason about different modules without looking into the implementation, the same way as we did with connect_to_database, query_for_students, et al functions - we were reasoning about them only based on their name and interface (and probably documentation). Now we can have a class student or a module Student both defining an abstraction called student, so that we can save a lot of brain power, without having to deal with the way how are those students implemented.

And beyond making our programs easier to understand, abstractions give us another benefit -- generalization. Since we don't need to reason about the implementation of a function or a module, it means that all implementations are interchangeable to some degree. Therefore, we can write our programs so that they will express their behavior in a general way, without breaking the abstractions, and then choose particular instances when we run our programs. Objects are runtime instances and essentially it means that we can choose our implementation in runtime. Which is nice. Classes are, however, rarely first-class citizens, therefore we have to invent different cumbersome methods to make the selection, like the Abstract Factory and Builder design patterns. For modules, the situation is even worse, since they are inherently a compile-time structure, we have to choose our implementation at the program building/lining time. Which is not what people want to do in the modern world.

And here comes first-class modules, being an amalgamation of modules and objects, they give us the best of two worlds - an easy to reason about stateless structures, which are, at the same time, a pure first-class citizens, which you can store in a variable, put into list and select the desired implementation in runtime.

Speaking of OCaml, underneath the hood, first-class modules are simply a record of functions. In OCaml, you can even add state to the first-class module making it practically indistinguishable from an object. This brings us to another topic - in the real world, the separation between objects and structures is not that clear. For example, OCaml provides both modules and objects and you can put objects inside modules and even vice verse. In C/C++ we have compilation units, symbols visibility, opaque data types, and header files, which enables some sort of modular programming, as well as we have structures and namespaces. Therefore, the difference is sometimes hard to tell.

Therefore, to summarize. Modules are pieces of code with a well-defined interface to access this code. First class modules are modules which could be manipulated as a regular value, e.g., stored in a data structure, assigned a variable, and picked at runtime.

like image 151
ivg Avatar answered Sep 27 '22 04:09

ivg


OCaml perspective here.

Modules and classes are very different.

First of all, classes in OCaml are a very specific (and complex) feature. To go into some detail, classes implement inheritance, row polymorphism and dynamic dispatch (aka virtual methods). It allows them to be highly flexible at the expense of some efficiency.

Modules, however, are quite a different thing altogether.

Indeed, you can see modules as atomic mini-libraries, and usually they are used to define a type and its accessors, but they are much more powerful than just that.

  • Modules allow you to create several types, as well as module types and submodules. Basically, they allow to create complex compartmentalization and abstraction.
  • Functors give you behavior similar to c++'s templates. Except they are safe. Basically, they are functions on modules, which allows you to parameterize a data structure or algorithm over some other module.

Modules are usually solved statically and therefore easy to inline, allowing you to write clear code without fear of a loss in efficiency.

Now, a first-class citizen is an entity that can be put in a variable, passed to a function and tested for equality. In a way, it means they will be dynamically evaluated.

For example, suppose you have a module Jpeg and a module Png that allow you to manipulate different kind of pics. Statically, you don't know what kind of image you'll need to display. So you can use first-class modules:

let get_img_type filename =
 match Filename.extension filename with
 | ".jpg" | ".jpeg" -> (module Jpeg : IMG_HANDLER)
 | ".png" -> (module Png : IMG_HANDLER)

let display_img img_type filename =
 let module Handler = (val img_type : IMG_HANDLER) in
 Handler.display filename
like image 38
PatJ Avatar answered Sep 25 '22 04:09

PatJ


The main differences between a module and an object usually are

  • Modules are second-class, i.e., they are rather static entities that cannot be passed around as values, while objects can.
  • Modules can contain types and all other forms of declarations (and types can be made abstract), while objects typically cannot.

However, as you note, there are languages where modules can be wrapped up as first-class values (e.g. Ocaml) and there are languages where objects can contain types (e.g. Scala). That blurs the line a little. There still tend to be various biases towards certain patterns, with different trade-offs made in the type systems. For example, objects focus on recursive types, while modules focus on type abstraction and allowing any definition. It is a very hard problem to support both at the same time without severe compromises, since that quickly leads to an undecidable type system.

like image 35
Andreas Rossberg Avatar answered Sep 23 '22 04:09

Andreas Rossberg


As has been mentioned already, "modules", "classes" and "objects" are more like tendencies than strict formal definitions. And if you implement modules as objects for example, as I understand Scala does, then obviously there are no fundamental differences between them, but mostly just syntactic differences that make them more convenient for certain use cases.

In regards to OCaml specifically though, here's a practical example of something you cannot do with modules that you can do with classes because of fundamental differences in implementation:

Modules have functions, which can reference each other recursively using the rec and and keyword. A module can also "inherit" the implementation of another module using include and override its definitions. For example:

module Base = struct
  let name = "base"
  let print () = print_endline name
end

module Child = struct
  include Base
  let name = "child"
end

but because modules are early bound, that is, names are resolved at compile time, it's not possible to get Base.print to reference Child.name instead of Base.name. At least not without altering both Base and Child significantly to explicitly enable it:

module AbstractBase(T : sig val name : string end) = struct
  let name = T.name
  let print () = print_endline name
end

module Base = struct
  include AbstractBase(struct let name = "base" end)
end

module Child = struct
  include AbstractBase(struct let name = "child" end)
end

With classes on the other hand, overriding is trivial and the default:

class base = object(self)
  method name = "base"
  method print = print_endline self#name
end

class child = object
  inherit base
  method! name = "child"
end

Classes can reference themselves, through a variable conventionally named this or self (in OCaml you can name it whatever you want, but self is the convention). They are also late bound, meaning they are resolved at runtime and can therefore call method implementations that didn't exist when it was defined. This is called open recursion.

So why aren't modules late bound too? Primarily for performance reasons I think. Doing a dictionary search on the name of every function call will undoubtedly have a significant impact on execution time.

like image 45
glennsl Avatar answered Sep 27 '22 04:09

glennsl