Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens internally when deleting an opened file in linux

Tags:

linux

file-io

I came across this and this questions on deleting opened files in linux

However, I'm still confused what happened in the RAM when a process(call it A) deletes an opened file by another process B.

What baffles me is this(my analysis could be wrong, please correct me if so):

  • When a process opens a file, a new entry for that file in the UFDT is created.
  • When a process deletes a file, all the links to the file are gone especially, we have no reference to its inode, thus, it gets removed from the GFDT
  • However, when modifying the file(say writing to it) it must be updated in the disk(since its pages gets modified/dirty), but it got no reference in the GFDT because of the earlier delete, so we don't know the inode to it.

The Question is why the "deleted" file still accessible by the process which opened it? And how is that been done by the operating system?

EDIT By UFDT i mean the file descriptor table of the process which holds the file descriptors of the files which opened by the process(each process has its own UFDT) and the GFDT is the global file descriptor table, there is only one GFDT in the system(RAM in our case).

like image 786
ThunderWiring Avatar asked Feb 10 '23 17:02

ThunderWiring


1 Answers

I never really heard about those UFDT and GFDT acronyms, but your view of the system sounds mostly right. I think you lack some detail on your description of how open files are managed by the kernel, and perhaps this is where your confusion comes from. I'll try to give a more detailed description.

First, there are three data structures used to keep track of and manage open files:

  • Each process has a table of file descriptors. Each entry in this table stores a file descriptor, and the file descriptor status flags (as of now, the only flag is O_CLOEXEC). The file descriptor is just a pointer to an entry in the file table entry, which I cover next. The integer returned by open(2) and family is usually an index into this file descriptor table - each process has its table, that's why open(2) and family may return the same value for different processes opening different files.
  • There is one opened files table in the entire system. Each file descriptor table entry of each process references one of these entries in the opened files table. There is one entry in this table for each opened file: if two processes open the same file, two entries in this global table are created, even though it's the same file. Each entry in the files table stores the file status flags (opened for reading, writing, appending, etc), and the current file offset. This is why different processes can read from and write to different offsets in the same file concurrently as long as each of them opens the file.
  • Each entry in the file table entry also references an entry in the vnode table. The vnode table is a global table that has one entry for each unique file. If processes A, B, and C open file D, there will be only one vnode table entry, referenced by all 3 of the file table entries (in Linux, there is really no vnode, rather there is an inode, but let's keep this description generic and conceptual). The vnode entry contains pretty much the same information as the traditional inode (file size, other attributes, etc.), but it also contains other information useful for opened files, such as file locks that are active, who owns them, which portions of the file they lock, etc. This vnode entry also stores pointers to the file's data blocks on disk.

Deleting a file consists of calling unlink(2). This function unlinks a file from a directory. Each file inode in disk has a count of the number of links pointing to it; the file is only really removed if the link count reaches 0 and it is not opened (or 2 in the case of directories, since a directory references itself and is also referenced by its parent). In fact, the manpage for unlink(2) is very specific about this behavior:

unlink - delete a name and possibly the file it refers to

So, instead of looking at unlinking as deleting a file, look at it as deleting a file name, and maybe the file it refers to.

When unlink(2) detects that there is an active vnode table entry referring this file, it doesn't delete the file from the filesystem. Nothing happens. Yes, you can't find the file on your filesystem anymore. find(1) won't find it. You can't open it in new processes.

But the file is still there. It just doesn't appear in any directory entry.

For example, if it's a huge file, and if you run df or du, you will see that space usage is the same. The file is still there, on disk, you just can't reach it.

So, any reads or writes take place as usual - the file data blocks are accessible through the vnode table entry. You can still know the file size. And the owner. And the permissions. All of it. Everything's there.

When the process terminates or explicitly closes the file, the operating system checks the inode. If the number of links pointing to the inode is 0 and this was the last process that opened the file (which is also indicated by storing a link count in the vnode table entry), then the file is purged.

like image 66
Filipe Gonçalves Avatar answered Feb 13 '23 02:02

Filipe Gonçalves