Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrency of RandomAccessFile in Java

I am creating a RandomAccessFile object to write to a file (on SSD) by multiple threads. Each thread tries to write a direct byte buffer at a specific position within the file and I ensure that the position at which a thread writes won't overlap with another thread:

file_.getChannel().write(buffer, position);

where file_ is an instance of RandomAccessFile and buffer is a direct byte buffer.

For the RandomAccessFile object, since I'm not using fallocate to allocate the file, and the file's length is changing, will this utilize the concurrency of the underlying media?

If it is not, is there any point in using the above function without calling fallocate while creating the file?

like image 302
user1715122 Avatar asked Jul 30 '17 04:07

user1715122


People also ask

Is RandomAccessFile thread safe?

After some serious googleing I found out that the RandomAccessFile-class is not thread-safe.

How does RandomAccessFile work in Java?

Java RandomAccessFile provides the facility to read and write data to a file. RandomAccessFile works with file as large array of bytes stored in the file system and a cursor using which we can move the file pointer position.

What is RandomAccessFile used for?

RandomAccessFile in Java is a class that allows data to be read from and written to at any location in the file. In other simple words, RandomAccessFile class allows creating files that can be used for reading and writing data with random access.

Which interface will be used for implementing RandomAccessFile class?

io. RandomAccessFile Class provides a way to random access files using reading and writing operations. It works like an array of byte storted in the File.


2 Answers

I made some testing with the following code :

   public class App {
    public static CountDownLatch latch;

    public static void main(String[] args) throws InterruptedException, IOException {
        File f = new File("test.txt");
        RandomAccessFile file = new RandomAccessFile("test.txt", "rw");
        latch = new CountDownLatch(5);
        for (int i = 0; i < 5; i++) {
            Thread t = new Thread(new WritingThread(i, (long) i * 10, file.getChannel()));
            t.start();

        }
        latch.await();
        file.close();
        InputStream fileR = new FileInputStream("test.txt");
        byte[] bytes = IOUtils.toByteArray(fileR);
        for (int i = 0; i < bytes.length; i++) {
            System.out.println(bytes[i]);

        }  
    }

    public static class WritingThread implements Runnable {
        private long startPosition = 0;
        private FileChannel channel;
        private int id;

        public WritingThread(int id, long startPosition, FileChannel channel) {
            super();
            this.startPosition = startPosition;
            this.channel = channel;
            this.id = id;

        }

        private ByteBuffer generateStaticBytes() {
            ByteBuffer buf = ByteBuffer.allocate(10);
            byte[] b = new byte[10];
            for (int i = 0; i < 10; i++) {
                b[i] = (byte) (this.id * 10 + i);

            }
            buf.put(b);
            buf.flip();
            return buf;

        }

        @Override
        public void run() {
            Random r = new Random();
            while (r.nextInt(100) != 50) {
                try {
                    System.out.println("Thread  " + id + " is Writing");
                    this.channel.write(this.generateStaticBytes(), this.startPosition);
                    this.startPosition += 10;
                } catch (IOException e) {
                    e.printStackTrace();

                }
            }
            latch.countDown();
        }
    }
}

So far what I've seen:

  • Windows 7 (NTFS partition): Run linearly (aka one thread writes and when it is over, another one gets to run)

  • Linux Parrot 4.8.15 (ext4 partition) (Debian based distro), with Linux Kernel 4.8.0: Threads intermingle during the execution

Again as the documentation says:

File channels are safe for use by multiple concurrent threads. The close method may be invoked at any time, as specified by the Channel interface. Only one operation that involves the channel's position or can change its file's size may be in progress at any given time; attempts to initiate a second such operation while the first is still in progress will block until the first operation completes. Other operations, in particular those that take an explicit position, may proceed concurrently; whether they in fact do so is dependent upon the underlying implementation and is therefore unspecified.

So I'd suggest to first give it a try and see if the OS(es) you are going to deploy your code to (possibly the filesystem type) support parallel execution of a FileChannel.write call

Edit: As pointed out to, the above does not mean that threads can write concurrently to the file, it is actually the opposite as the write call behave according to the contract of a WritableByteChannel which clearly specifies that only one thread at a time can write to a given file:

If one thread initiates a write operation upon a channel then any other thread that attempts to initiate another write operation will block until the first operation is complete

like image 161
Adonis Avatar answered Oct 13 '22 17:10

Adonis


As the documentation states and Adonis already mentions this, a write can only be performed by one thread at a time. You won't achieve performance gains through concurreny, moreover, you should only worry about performance if it's an actual issue, because writing concurrently to a disk may actually degrade your performance (probably less for SSDs than HDDs).

The underlying media is in most cases (SSD, HDD, Network) single-threaded - actually, there is no such thing as a thread on hardware level, threads are nothing but an abstraction.

In your case the media is an SSD. While the SSD internally may write data to multiple modules concurrently ( they may reach a level of parallism where writes may be as fast and even outperform a read), the internal mapping datastructures are shared resource and therefore contended, especially on frequent updates such as concurrent writes. Nevertheless, the updates of this datastructure is quite fast and therefore nothing to worry about unless it becomes a problem.

But apart from this, those are just internals of the SSD. On the outside you communicate over a Serial ATA interface, thus one-byte-at-a-time (actually packets in a Frame Information Structure, FIS). On top of this is a OS/Filesystem that again has a probably contended datastructure and/or applies their own means of optimization such as write-behind-caching.

Further, as you know what your media is, you may optimize especially for that and SSDs are really fast when one single threads writes a large piece of data.

Thus, instead of using multiple threads for writing, you may create a large In-Memory Buffer (probably consider a memory-mapped file) and write concurrently into this buffer. The memory itself is not contended, as long as you ensure each thread access it's own address space of the buffer. Once all threads are done, you write this one buffer to the SSD (not needed if using memory-mapped file).

See also this good summary about developing for SSDs: A Summary – What every programmer should know about solid-state drives

The point for doing pre-allocation (or to be more precise, file_.setLength(), which acutally maps to ftruncate) is that the resizing of the file may use extra-cycles and you may wan't to avoid that. But again, this may depend on the OS/Filesystem.

like image 8
Gerald Mücke Avatar answered Oct 13 '22 16:10

Gerald Mücke