Logo Questions Linux Laravel Mysql Ubuntu Git Menu

What do programs see when ZFS can't deliver uncorrupted data?

Say my program attempts a read of a byte in a file on a ZFS filesystem. ZFS can locate a copy of the necessary block, but cannot locate any copy with a valid checksum (they're all corrupted, or the only disks present have corrupted copies). What does my program see, in terms of the return value from the read, and the byte it tried to read? And is there a way to influence the behavior (under Solaris, or any other ZFS-implementing OS), that is, force failure, or force success, with potentially corrupt data?

like image 800
Jay Kominek Avatar asked Mar 01 '23 01:03

Jay Kominek

2 Answers

EIO is indeed the only answer with current ZFS implementations.

An open ZFS "bug" asks for some way to read corrupted data: http://bugs.opensolaris.org/bugdatabase/printableBug.do?bug_id=6186106

I believe this is already doable using the undocumented but open source zdb utility. Have a look at http://www.cuddletech.com/blog/pivot/entry.php?id=980 for explanations about how to dump a file content using zdb -R option and "r" flag.

like image 194
jlliagre Avatar answered Mar 05 '23 19:03


Solaris 10:

# Create a test pool
[root@tesalia z]# cd /tmp
[root@tesalia tmp]# mkfile 100M zz
[root@tesalia tmp]# zpool create prueba /tmp/zz

# Fill the pool
[root@tesalia /]# dd if=/dev/zero of=/prueba/dummy_file
dd: writing to `/prueba/dummy_file': No space left on device
129537+0 records in
129536+0 records out
66322432 bytes (66 MB) copied, 1.6093 s, 41.2 MB/s

# Umount the pool
[root@tesalia /]# zpool export prueba

# Corrupt the pool on purpose
[root@tesalia /]# dd if=/dev/urandom of=/tmp/zz seek=100000 count=1 conv=notrunc
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.0715209 s, 7.2 kB/s

# Mount the pool again
zpool import -d /tmp prueba

# Try to read the corrupted data
[root@tesalia tmp]# md5sum /prueba/dummy_file 
md5sum: /prueba/dummy_file: I/O error

# Read the manual
[root@tesalia tmp]# man -s2 read
     Upon successful completion,  read()  and  readv()  return  a
     non-negative integer indicating the number of bytes actually
     read. Otherwise, the functions return -1 and  set  errno  to
     indicate the error.

     The read(), readv(), and pread() functions will fail if:
     EIO        A physical I/O error has occurred, [...]

You must export/import the test pool because, if not, the direct overwrite (pool corruption) will be missed since the file will still be cached in OS memory.

And no, currently ZFS will refuse to give you corrupted data. As it should.

like image 21
jcea Avatar answered Mar 05 '23 17:03
