http://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_flush_method
Based one the article above, if we choose the option O_DIRECT, it described that
O_DIRECT: InnoDB uses O_DIRECT (or directio() on Solaris) to open the data files, and uses fsync() to flush both the data and log files.
As O_DIRECT meaning no\minimizing data would be cached in the kernel page cache, however fsync() is used to flush the data from the page cache to device, so my question is Why MYSQL still use fsync() to flush both the data when the option is O_DIRECT?
Actually, the explanation is added in the documentation you linked in the paragraph following O_DIRECT
option's description (highlighting is mine):
O_DIRECT_NO_FSYNC: InnoDB uses O_DIRECT during flushing I/O, but skips the fsync() system call afterward. This setting is suitable for some types of file systems but not others. For example, it is not suitable for XFS. If you are not sure whether the file system you use requires an fsync(), for example to preserve all file metadata, use O_DIRECT instead. This option was introduced in MySQL 5.6.7 (Bug #11754304, Bug #45892).
MySQL bug #45892 contains additional information:
Some testing by Domas has shown that some filesystems (XFS) do not sync metadata without the fsync. If the metadata would change, then you need to still use fsync (or O_SYNC for file open).
For example, if a file grows while O_DIRECT is enabled it will still write to the new part of the file, however since the metadata doesn't reflect the new size of the file the tail portion can be lost in the event of a crash.
Solution:
Continue to use fsync when important metadata changes or use O_SYNC in addition to O_DIRECT.
To sum it up: not using fsync() with certain file systems would cause MySQL to fail. However, MySQL offers the option from v5.6.7 to configure MySQL (well, innodb) tailored to your own OS' capabilities in this aspect by adding O_DIRECT_NO_FSYNC
option.
O_DIRECT skips OS cache but it does not ensure that data is persisted on disk. O_DIRECT writes only to drive write cache. Once drive write cache is disabled the rate falls down to fsync level. O_DIRECT could be a good option if drive write is crash safe (backed by a battery).
Check this blog for a very thorough analysis
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With