I know that my question has an answer here: QFile seek performance. But I am not completely satisfied with the answer. Even after looking at the following implementation of generic_file_llseek()
for ext4, I can't seem to understand how can the complexity be measured.
/**
* generic_file_llseek - generic llseek implementation for regular files
* @file: file structure to seek on
* @offset: file offset to seek to
* @origin: type of seek
*
* This is a generic implemenation of ->llseek useable for all normal local
* filesystems. It just updates the file offset to the value specified by
* @offset and @origin under i_mutex.
*/
loff_t generic_file_llseek(struct file *file, loff_t offset, int origin)
{
loff_t rval;
mutex_lock(&file->f_dentry->d_inode->i_mutex);
rval = generic_file_llseek_unlocked(file, offset, origin);
mutex_unlock(&file->f_dentry->d_inode->i_mutex);
return rval;
}
/**
* generic_file_llseek_unlocked - lockless generic llseek implementation
* @file: file structure to seek on
* @offset: file offset to seek to
* @origin: type of seek
*
* Updates the file offset to the value specified by @offset and @origin.
* Locking must be provided by the caller.
*/
loff_t
generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin)
{
struct inode *inode = file->f_mapping->host;
switch (origin) {
case SEEK_END:
offset += inode->i_size;
break;
case SEEK_CUR:
/*
* Here we special-case the lseek(fd, 0, SEEK_CUR)
* position-querying operation. Avoid rewriting the "same"
* f_pos value back to the file because a concurrent read(),
* write() or lseek() might have altered it
*/
if (offset == 0)
return file->f_pos;
break;
}
if (offset < 0 || offset > inode->i_sb->s_maxbytes)
return -EINVAL;
/* Special lock needed here? */
if (offset != file->f_pos) {
file->f_pos = offset;
file->f_version = 0;
}
return offset;
}
Say, for example, I have a 4GB file, and I know the offset for the middle portion in the file, how exactly does a lseek()
get me there without traversing the entire file? Does the OS already know where each byte of the file resides?
Yes, it will be fast enough, unless the OS or filesystem is terrible.
RETURN VALUEUpon successful completion, lseek() returns the resulting offset location as measured in bytes from the beginning of the file. Otherwise, a value of (off_t)-1 is returned and errno is set to indicate the error.
The lseek() function shall allow the file offset to be set beyond the end of the existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap.
The lseek() function changes the current file offset to a new position in the file. The new position is the given byte offset from the position specified by whence. After you have used lseek() to seek to a new location, the next I/O operation on the file begins at that location.
lseek()
as implemented in ext4
will just increment the file pointer and do some validation checks. It doesn't depend on the file size, meaning it is O(1)
.
Also you can see this in the code, there isn't any loop nor suspicious function calls in there.
However, while this is true on ext4
, it might be not true for other filesystems, as this behaviour isn't guaranteed by POSIX. But it is likely unless the filesystem is meant for a very special purpose.
lseek
's complexity depends on the representation of file in your system. On most modern systems a file is organized by some clever tree-like data structure resulting into seek
being executed in time O(logx(n))
, where n is the size of the file and x some system depending number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With