Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can inode and crtime be used as a unique file identifier?

Tags:

linux

inode

I have a file indexing database on Linux. Currently I use file path as an identifier. But if a file is moved/renamed, its path is changed and I cannot match my DB record to the new file and have to delete/recreate the record. Even worse, if a directory is moved/renamed, then I have to delete/recreate records for all files and nested directories.

I would like to use inode number as a unique file identifier, but inode number can be reused if file is deleted and another file created.

So, I wonder whether I can use a pair of {inode,crtime} as a unique file identifier. I hope to use i_crtime on ext4 and creation_time on NTFS. In my limited testing (with ext4) inode and crtime do, indeed, remain unchanged when renaming or moving files or directories within the same file system.

So, the question is whether there are cases when inode or crtime of a file may change. For example, can fsck or defragmentation or partition resizing change inode or crtime or a file?

Interesting that http://msdn.microsoft.com/en-us/library/aa363788%28VS.85%29.aspx says:

  • "In the NTFS file system, a file keeps the same file ID until it is deleted."
    but also:
  • "In some cases, the file ID for a file can change over time."

So, what are those cases they mentioned?

Note that I studied similar questions:

  • How to determine the uniqueness of a file in linux?
  • Executing 'mv A B': Will the 'inode' be changed?
  • Best approach to detecting a move or rename to a file in Linux?

but they do not answer my question.

like image 420
jhnlmn Avatar asked Apr 17 '13 20:04

jhnlmn


People also ask

Are inodes unique?

In addition to its file name, each file in a file system has an identification number, called an inode number, that is unique in its file system. The inode number refers to the physical file, the data stored in a particular location.

What is the unique identifier of a file?

A Unique Identifier (UID) uniquely identifies a resource. This means that the identifier may change for the particular embodiment of the resource and each copy of the resource has its own ID. It consequently means that the UID are URL's.

Does every file have a unique inode?

In short, each filesystem mounted to your computer has its own inodes. An inode number may be used more than once but never by the same filesystem. The filesystem id combines with the inode number to create a unique identification label.

What is an inode and what is it used for?

Inodes store information about files and directories (folders), such as file ownership, access mode (read, write, execute permissions), and file type. On many older file system implementations, the maximum number of inodes is fixed at file system creation, limiting the maximum number of files the file system can hold.


2 Answers

  • {device_nr,inode_nr} are a unique identifier for an inode within a system
  • moving a file to a different directory does not change its inode_nr
  • the linux inotify interface enables you to monitor changes to inodes (either files or directories)

Extra notes:

  • moving files across filesystems is handled differently. (it is infact copy+delete)
  • networked filesystems (or a mounted NTFS) can not always guarantee the stability of inodenumbers
  • Microsoft is not a unix vendor, its documentation does not cover Unix or its filesystems, and should be ignored (except for NTFS's internals)

Extra text: the old Unix adagium "everything is a file" should in fact be: "everything is an inode". The inode carries all the metainformation about a file (or directory, or a special file) except the name. The filename is in fact only a directory entry that happens to link to the particular inode. Moving a file implies: creating a new link to the same inode, end deleting the old directory entry that linked to it. The inode metatata can be obtained by the stat() and fstat() ,and lstat() system calls.

like image 128
wildplasser Avatar answered Oct 13 '22 22:10

wildplasser


The allocation and management of i-nodes in Unix is dependent upon the filesystem. So, for each filesystem, the answer may vary.

For the Ext3 filesystem (the most popular), i-nodes are reused, and thus cannot be used as a unique file identifier, nor is does reuse occur according to any predictable pattern.

In Ext3, i-nodes are tracked in a bit vector, each bit representing a single i-node number. When an i-node is freed, it's bit is set to zero. When a new i-node is needed, the bit vector is searched for the first zero-bit and the i-node number (which may have been previously allocated to another file) is reused.

This may lead to the naive conclusion that the lowest numbered available i-node will be the one reused. However, the Ext3 file system is complex and highly optimised, so no assumptions should be made about when and how i-node numbers can be reused, even though they clearly will.

From the source code for ialloc.c, where i-nodes are allocated:

There are two policies for allocating an inode. If the new inode is a directory, then a forward search is made for a block group with both free space and a low directory-to-inode ratio; if that fails, then of he groups with above-average free space, that group with the fewest directories already is chosen. For other inodes, search forward from the parent directory's block group to find a free inode.

The source code that manages this for Ext3 is called ialloc and the definitive version is here: https://github.com/torvalds/linux/blob/master/fs/ext3/ialloc.c

like image 26
Gary Wisniewski Avatar answered Oct 13 '22 23:10

Gary Wisniewski