In all programming languages (that I use at least), you must open a file before you can read or write to it.
But what does this open operation actually do?
Manual pages for typical functions dont actually tell you anything other than it 'opens a file for reading/writing':
http://www.cplusplus.com/reference/cstdio/fopen/
https://docs.python.org/3/library/functions.html#open
Obviously, through usage of the function you can tell it involves creation of some kind of object which facilitates accessing a file.
Another way of putting this would be, if I were to implement an open
function, what would it need to do on Linux?
There is a possibility that when we try to open a file using the function fopen( ), the file may not be opened. While opening the file in “r” mode, this may happen because the file being opened may not be present on the disk at all. And you obviously cannot read a file that doesn't exist.
The act of closing the file (actually, the stream) ends the association; the transaction with the file system is terminated, and input/output may no longer be performed on the stream. The stream function close may be used to close a file; the functions described below may be used to open them.
"w" – Write mode – it opens a file in write mode to write data in it, if the file is opened in write mode then existing data will be removed, it also creates a file, if the file does not exist.
In almost every high-level language, the function that opens a file is a wrapper around the corresponding kernel system call. It may do other fancy stuff as well, but in contemporary operating systems, opening a file must always go through the kernel.
This is why the arguments of the fopen
library function, or Python's open
closely resemble the arguments of the open(2)
system call.
In addition to opening the file, these functions usually set up a buffer that will be consequently used with the read/write operations. The purpose of this buffer is to ensure that whenever you want to read N bytes, the corresponding library call will return N bytes, regardless of whether the calls to the underlying system calls return less.
I am not actually interested in implementing my own function; just in understanding what the hell is going on...'beyond the language' if you like.
In Unix-like operating systems, a successful call to open
returns a "file descriptor" which is merely an integer in the context of the user process. This descriptor is consequently passed to any call that interacts with the opened file, and after calling close
on it, the descriptor becomes invalid.
It is important to note that the call to open
acts like a validation point at which various checks are made. If not all of the conditions are met, the call fails by returning -1
instead of the descriptor, and the kind of error is indicated in errno
. The essential checks are:
In the context of the kernel, there has to be some kind of mapping between the process' file descriptors and the physically opened files. The internal data structure that is mapped to the descriptor may contain yet another buffer that deals with block-based devices, or an internal pointer that points to the current read/write position.
I'd suggest you take a look at this guide through a simplified version of the open()
system call. It uses the following code snippet, which is representative of what happens behind the scenes when you open a file.
0 int sys_open(const char *filename, int flags, int mode) {
1 char *tmp = getname(filename);
2 int fd = get_unused_fd();
3 struct file *f = filp_open(tmp, flags, mode);
4 fd_install(fd, f);
5 putname(tmp);
6 return fd;
7 }
Briefly, here's what that code does, line by line:
The filp_open
function has the implementation
struct file *filp_open(const char *filename, int flags, int mode) {
struct nameidata nd;
open_namei(filename, flags, mode, &nd);
return dentry_open(nd.dentry, nd.mnt, flags);
}
which does two things:
struct file
with the essential information about the inode and return it. This struct becomes the entry in that list of open files that I mentioned earlier.Store ("install") the returned struct into the process's list of open files.
read()
, write()
, and close()
. Each of these will hand off control to the kernel, which can use the file descriptor to look up the corresponding file pointer in the process's list, and use the information in that file pointer to actually perform the reading, writing, or closing.If you're feeling ambitious, you can compare this simplified example to the implementation of the open()
system call in the Linux kernel, a function called do_sys_open()
. You shouldn't have any trouble finding the similarities.
Of course, this is only the "top layer" of what happens when you call open()
- or more precisely, it's the highest-level piece of kernel code that gets invoked in the process of opening a file. A high-level programming language might add additional layers on top of this. There's a lot that goes on at lower levels. (Thanks to Ruslan and pjc50 for explaining.) Roughly, from top to bottom:
open_namei()
and dentry_open()
invoke filesystem code, which is also part of the kernel, to access metadata and content for files and directories. The filesystem reads raw bytes from the disk and interprets those byte patterns as a tree of files and directories./dev/sda
and the like.)This may also be somewhat incorrect due to caching. :-P Seriously though, there are many details that I've left out - a person (not me) could write multiple books describing how this whole process works. But that should give you an idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With