Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does std::filesystem provide so many non-member functions?

Consider for example file_size. To get the size of a file we will be using

std::filesystem::path p = std::filesystem::current_path();
// ... usual "does this exist && is this a file" boilerplate
auto n = std::filesystem::file_size(p);

Nothing wrong with that, if it were plain ol' C, but having been taught that C++ is an OO language [I do know it's multi-paradigm, apologies to our language lawyers :-)] that just feels so ... imperative (shudder) to me, where I have come to expect the object-ish

auto n = p.file_size();

instead. The same holds for other functions, such as resize_file, remove_file and probably more.

Do you know of any rationale why Boost and consequently std::filesystem chose this imperative style instead of the object-ish one? What is the benefit? Boost mentions the rule (at the very bottom), but no rationale for it.

I was thinking about inherent issues such as ps state after remove_file(p), or error flags (overloads with additional argument), but neither approach solves these less elegant than the other.


You can observe a similar pattern with iterators, where nowadays we can (are supposed to?) do begin(it) instead of it.begin(), but here I think the rationale was to be more in line with the non-modifying next(it) and such.

like image 393
dlw Avatar asked Mar 27 '17 17:03

dlw


2 Answers

There are a couple of good answers already posted, but they do not get to the heart of the matter: all other things being equal, if you can implement something as a free, non-friend function, you always should.

Why?

Because, free, non-friend functions, do not have privileged access to state. Testing classes is much harder than testing functions because you have to convince yourself that the class' invariants are maintained no matter which members functions are called, or even combinations of member functions. The more member/friend functions you have, the more work you have to do.

Free functions can be reasoned about and tested standalone. Because they don't have privileged access to class state, they cannot possibly violate any class invariants.

I don't know the details of what invariants and what privileged access path allows, but obviously they were able to implement a lot of functionality as free functions, and they make the right choice and did so.

Scott Meyers brilliant article on this topic, giving the "algorithm" for whether to make a function a member or not.

Here's Herb Sutter bemoaning the massive interface of std::string. Why? Because, much of string's interface could have been implemented as free functions. It may be a bit more unwieldy to use on occasion, but it's easier to test, reason about, improves encapsulation and modularity, opens opportunities up for code reuse that were not there before, etc.

like image 184
Nir Friedman Avatar answered Oct 05 '22 12:10

Nir Friedman


The Filesystem library has a very clear separation between the filesystem::path type, which represents an abstract path name (that doesn't even have be the name of a file that exists) and operations that access the actual physical filesystem, i.e. read+write data on disks.

You even pointed to the explanation of that:

The design rule is that purely lexical operations are supplied as class path member functions, while operations performed by the operating system are provided as free functions.

This is the reason.

It's theoretically possible to use a filesystem::path on a system with no disks. The path class just holds a string of characters and allows manipulating that string, converting between character sets and using some rules that define the structure of filenames and pathnames on the host OS. For example it knows that directory names are separated by / on POSIX systems and by \ on Windows. Manipulating the string held in a path is a "lexical operation", because it just performs string manipulation.

The non-member functions that are known as "filesystem operations" are entirely different. They don't just work with an abstract path object that is just a string of characters, they perform the actual I/O operations that access the filesystem (stat system calls, open, readdir etc.). These operations take a path argument that names the files or directories to operate on, and then they access the real files or directories. They don't just manipulate strings in memory.

Those operations depend on the API provided by the OS for accessing files, and they depend on hardware that might fail in completely different ways to in-memory string manipulations. Disks might be full, or might get unplugged before an operation completes, or might have hardware faults.

Looked at like that, of course file_size isn't a member of path, because it's nothing to do with the path itself. The path is just a representation of a filename, not of an actual file. The function file_size looks for a physical file with the given name and tries to read its size. That's not a property of the file name, it's a property of a persistent file on the filesystem. Something that exists entirely separately from the string of characters in memory that holds the name of a file.

Put another way, I can have a path object that contains complete nonsense, like filesystem::path p("hgkugkkgkuegakugnkunfkw") and that's fine. I can append to that path, or ask if it has a root directory etc. But I can't read the size of such a file if it doesn't exist. I can have a path to files that do exist, but I don't have permission to access, like filesystem::path p("/root/secret_admin_files.txt"); and that's also fine, because it's just a string of characters. I'd only get a "permission denied" error when I tried to access something in that location using the filesystem operation functions.

Because path member functions never touch the filesystem they can never fail due to permissions, or non-existent files. That's a useful guarantee.

You can observe a similar pattern with iterators, where nowadays we can (are supposed to?) do begin(it) instead of it.begin(), but here I think the rationale was to be more in line with the non-modifying next(it) and such.

No, it was because it works equally well with arrays (which can't have member functions) and class types. If you know the range-like thing you are dealing with is a container not an array then you can use x.begin() but if you're writing generic code and don't know whether it's a container or an array then std::begin(x) works in both cases.

The reasons for both these things (the filesystem design and the non-member range access functions) are not some anti-OO preference, they're for far more sensible, practical reasons. It would have been poor design to have based either of them because it feels better to some people who like OO, or feels better to people who don't like OO.

Also, there are things you can't do when everything's a member function:

struct ConvertibleToPath {
  operator const std::filesystem::path& () const;
  // ...
};

ConvertibleToPath c;
auto n = std::filesystem::file_size(c);  // works fine

But if file_size was a member of path:

c.file_size();   // wouldn't work
static_cast<const std::filesystem::path&>(c).file_size(); // yay, feels object-ish!
like image 38
Jonathan Wakely Avatar answered Oct 05 '22 12:10

Jonathan Wakely