Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert data as fast as possible with Hibernate

I read file and create a Object from it and store to postgresql database. My file has 100,000 document that I read from one file and split it and finally store to database. I can't create List<> and store all document in List<> because my RAM is little. My code to read and write to database are as below. But My JVM Heap fills and can not continue to store more document. How to read file and store to database efficiently.

public void readFile() {
    StringBuilder wholeDocument = new StringBuilder();
    try {
        bufferedReader = new BufferedReader(new FileReader(files));
        String line;
        int count = 0;
        while ((line = bufferedReader.readLine()) != null) {
            if (line.contains("<page>")) {
                wholeDocument.append(line);
                while ((line = bufferedReader.readLine()) != null) {
                    wholeDocument = wholeDocument.append("\n" + line);
                    if (line.contains("</page>")) {
                        System.out.println(count++);
                        addBodyToDatabase(wholeDocument.toString());

                        wholeDocument.setLength(0);
                        break;
                    }
                }
            }
        }
        wikiParser.commit();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            bufferedReader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

public void addBodyToDatabase(String wholeContent) {
    Page page = new Page(new Timestamp(System.currentTimeMillis()),
            wholeContent);
    database.addPageToDatabase(page);
}

public static int counter = 1;

public void addPageToDatabase(Page page) {
    session.save(page);
    if (counter % 3000 == 0) {
        commit();
    }
    counter++;
}
like image 697
Nasir Avatar asked Jan 12 '16 08:01

Nasir


1 Answers

First of all you should apply a fork-join approach here.

The main task parses the file and sends batches of at most 100 items to an ExecutorService. The ExecutorService should have a number of worker threads that equals the number of available database connections. If you have 4 CPU cores, let's say that the database can take 8 concurrent connections without doing to much context switching.

You should then configure a connection pooling DataSource and have a minSize equal to maxSize and equal to 8. Try HikariCP or ViburDBCP for connection pooling.

Then you need to configure JDBC batching. If you're using MySQL, the IDENTITY generator will disable bathing. If you're using a database that supports sequences, make sure you also use the enhanced identifier generators (they are the default option in Hibernate 5.x).

This way the entity insert process is parallelized and decoupled from the main parsing thread. The main thread should wait for the ExecutorService to finish processing all tasks prior to shutting down.

like image 115
Vlad Mihalcea Avatar answered Nov 03 '22 00:11

Vlad Mihalcea