Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: Concurrent reads on an InputStream

Been looking around for a little while now and I'm a bit confused on this issue. I want to be able to take an input stream and read it concurrently in segments. The segments don't interact with each other they are just values to be inserted or updated in a database from an uploaded file. Is it possible to read an input stream concurrently by setting a segment size and then just skipping forward before spinning off a new thread to handle the conversion and insert/update?

Essentially the file is a list of ID's (one ID per line), although it would be preferable if I could specify a separator. Some files can be huge so I would like to process and convert the data to be into segments so that after inserting/updating to the database the JVM memory can be freed up. Is this possible? And if so are there any libraries out there that do this already?

Cheers and thanks in advance,

Alexei Blue.

like image 434
Alexei Blue Avatar asked Apr 23 '13 01:04

Alexei Blue


People also ask

Can we read InputStream twice?

Depending on where the InputStream is coming from, you might not be able to reset it. You can check if mark() and reset() are supported using markSupported() . If it is, you can call reset() on the InputStream to return to the beginning. If not, you need to read the InputStream from the source again.

Is InputStream thread safe?

Note: The ObservableInputStream is not thread safe, as instances of InputStream usually aren't. If you must access the stream from multiple threads, then synchronization, locking, or a similar means must be used.

Does closing InputStreamReader close the InputStream?

Closing an InputStreamReader will also close the InputStream instance from which the InputStreamReader is reading.

Do I need to close InputStreamReader Java?

read(); // throws exception: stream is closed. Therefore, if you close the Reader, you don't need to also close the InputStream.


1 Answers

A good approach might instead be to have a single reader that reads chunks and then hands each chunk off to a worker thread from a thread pool. Given that these will be inserted into a database the inserts will be by far the slow parts compared to reading the input so a single thread should suffice for reading.

Below is an example that hands off processing of each line from System.in to a worker thread. Performance of database inserts is much better if you perform a large number inserts within a single transaction so passing in a group of say 1000 lines would be better than passing in a single line as in the example.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Main {
    public static class Worker implements Runnable {
        private final String line;

        public Worker(String line) {
            this.line = line;
        }

        @Override
        public void run() {
            // Process line here.
            System.out.println("Processing line: " + line);
        }
    }

    public static void main(String[] args) throws IOException {
        // Create worker thread pool.
        ExecutorService service = Executors.newFixedThreadPool(4);

        BufferedReader buffer = new BufferedReader(new InputStreamReader(System.in));
        String line;

        // Read each line and hand it off to a worker thread for processing.
        while ((line = buffer.readLine()) != null) {
            service.execute(new Worker(line));
        }
    }
}
like image 160
Ed Plese Avatar answered Sep 23 '22 03:09

Ed Plese