I'm having something like:
List<Data> dataList = stepts.stream()
.flatMap(step -> step.getPartialDataList().stream())
.collect(Collectors.toList());
So I'm combining into dataList multiple lists from every step.
My problem is that dataList might run into OutOfMemoryError. Any suggestions on how I can batch the dataList and save the batches into db?
My primitive idea is to:
for (Step step : steps) {
List<Data> partialDataList = step.getPartialDataList();
if (dataList.size() + partialDataList.size() <= MAXIMUM_SIZE) {
dataList.addAll(partialDataList);
} else {
saveIntoDb(dataList);
dataList = new ArrayList<>();
}
}
PS: I know there is this post, but the difference is that I might not be able to store whole data in memory.
LE: getPartialDataList metod is more like createPartialDataList()
Yes, streams are sometimes slower than loops, but they can also be equally fast; it depends on the circumstances. The point to take home is that sequential streams are no faster than loops.
No storage. Streams don't have storage for values; they carry values from a source (which could be a data structure, a generating function, an I/O channel, etc) through a pipeline of computational steps.
Parallel Streams can actually slow you down It breaks them into subproblems which then run on separate threads for processing, these can go to different cores and then get combined when they're done. This all happens under the hood using the fork/join framework.
Streams are lazy because intermediate operations are not evaluated until terminal operation is invoked. Each intermediate operation creates a new stream, stores the provided operation/function and return the new stream. The pipeline accumulates these newly created streams.
If your concern is OutOfMemoryError
you probably shouldn't create additional intermediate data structures like lists or streams before saving to the database.
Since the Step.getPartialDataList()
already returns List<Data>
the data is already in the memory, unless you have your own List
implementation. You just need to use JDBC batch insert:
PreparedStatement ps = c.prepareStatement("INSERT INTO data VALUES (?, ?, ...)");
for (Step step : steps) {
for (Data data : step.getPartialDataList()) {
ps.setString(1, ...);
ps.setString(2, ...);
...
ps.addBatch();
}
}
ps.executeBatch();
There is no need to chunk into smaller batches prematurely with dataList
. First see what your database and JDBC driver are supporting before doing premature optimizations.
Do note that for most databases the right way to insert large amount of data is an external utility and not JDBC e.g. PostgreSQL has COPY
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With