Suppose I want to get several of a file's properties (owner, size, permissions, times) as returned by the lstat() system call. One way to do this in Java is to create a java.io.File object and do calls like length(), lastModified(), etc. on it. I have two problems so far:
Each one of these calls triggers a stat() call, and for my purposes stat()s are considered expensive: I'm trying to scan billions of files in parallel on hundreds of hosts, and (to a first approximation) the only way to access these files is via NFS, often against filer clusters where stat() under load may take half a second.
The call isn't lstat(), it's typically stat() (which follows symlinks) or fstat64() (which opens the file and may trigger a write operation to record the access time).
Is there a "right" way to do this, such that I end up just doing a single lstat() call and accessing the members of the struct stat? What I have found so far from Googling:
JDK 7 will have the PosixFileAttributes interface in java.nio.file with everything I want (but I'd rather not be running nightly builds of my JDK if I can avoid it).
I can roll my own interface with JNI or JNA (but I'd rather not if there's an existing one).
A previous similar question got a couple of suggested JNI/JNA implementations. One is gone and the other is questionably maintained (e.g., no downloads, just an hg repository).
Are there any better options out there?
lstat() is identical to stat(), except that if pathname is a symbolic link, then it returns information about the link itself, not the file that the link refers to. fstat() is identical to stat(), except that the file about which information is to be retrieved is specified by the file descriptor fd.
The lstat() function gets status information about a specified file and places it in the area of memory pointed to by buf. If the named file is a symbolic link, lstat() returns information about the symbolic link itself. The information is returned in the stat structure, referenced by buf.
st_mode is of type mode_t and that type is an unspecified integer type, probably int . The concept of base, that is decimal, octal or hexadecimal, is useful only when you covert the number in memory in native format to a text representation, (or back from text to native).
stat() function in C stat() function is used to list properties of a file identified by path . It reads all file properties and dumps to buf structure. The function is defined in sys/stat. h header file.
Looks like you've pretty much covered all the bases. When I started reading your question my first thought was JDK 7 or JNI. Without knowing anything about the change pattern on these files you might also look into some sort of persistent cache of the information in question, like an embedded DB. You could also look at some other access method besides NFS, like a custom web service that provides bulk file information from a remote host.
Yes, stat() is under all the calls and libraries. It is a latency problem. However, you can do many stat() at once, as there are many NFS server daemons to support your connections, using threads unless someone has an asynchronous stat() up their sleeve! If you could get on the host, like with ssh, stat() would be much cheaper. You could even write a tcp service to stream in paths and stream out stat(). Unfortunately, access to the NFS server is hard or impossible, as it may only have admin accounts, be a Hitachi SAN or something.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With