I'm new to Java 8 and I have just started using the NIO package for file-handling. I need help in how to process large files--varying from 100,000 lines to 1,000,000 lines per file--by transforming each line into a specific format and writing the formatted lines to new files. The new file(s) generated must only contain a maximum of 100,000 lines per file. So:
I'm having a hard time figuring out an approach that will efficiently utilize the new features of Java 8. I've started out with determining the number of new files to be generated based on the line count of the large file, and then creating those new empty files:
Path largFile = Path.get("path\to\file");
long recordCount = Files.lines(file).count();
int maxRecordOfNewFiles = 100000;
int numberOfNewFiles = 1;
if (recordCount > maxRecordOfNewFiles) {
numberOfNewFiles = Math.toIntExact(recordCount / maxRecordOfNewFiles);
if (Math.toIntExact(recordCount % maxRecordOfNewFiles) > 0) {
numberOfNewFiles ++;
}
}
IntStream.rangeClosed(1, numberOfNewFiles).forEach((i)
-> {
try {
Path newFile = Paths.get("path\to\newFiles\newFile1.txt");
Files.createFile(cdpFile);
} catch (IOException iOex) {
}
});
But as I go through the the lines of the largeFile
through the Files.lines(largeFile).forEach(())
capability, I got lost on how to proceed with formatting the first 100,000 lines and then determining the first of the new files and printing them on that file, and then the second batch of 100,000 to the second new file, and so on.
Any help will be appreciated. :)
When you start conceiving batch processes, I think you should consider using a framework specialized in that. You may want to handle restarts, scheduling... Spring Batch is very good for that and already provides what you want: MultiResourceItemWriter
that writes to multiple files with max lines per file and FlatFileItemReader
to read data from a file.
In this case, what you want is to loop over each line of an input file and write a transformation of each line in multiple output files.
One way to do that would be to create a Stream over the lines of the input file, map each line and send it to a custom writer. This custom writer would implement the logic of switching writer when it has reached the maximum number of lines per file.
In the following code MyWriter
opens a BufferedWriter
to a file. When the maxLines
is reached (a multiple of it), this writer is closed and another one is opened, incrementing currentFile
. This way, it is transparent for the reader that we're writing to multiple files.
public static void main(String[] args) throws IOException {
try (
MyWriter writer = new MyWriter(10);
Stream<String> lines = Files.lines(Paths.get("path/to/file"));
) {
lines.map(l -> /* do transformation here */ l).forEach(writer::write);
}
}
private static class MyWriter implements AutoCloseable {
private long count = 0, currentFile = 1, maxLines = 0;
private BufferedWriter bw = null;
public MyWriter(long maxLines) {
this.maxLines = maxLines;
}
public void write(String line) {
try {
if (count % maxLines == 0) {
close();
bw = Files.newBufferedWriter(Paths.get("path/to/newFiles/newFile" + currentFile++ + ".txt"));
}
bw.write(line);
bw.newLine();
count++;
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
@Override
public void close() throws IOException {
if (bw != null) bw.close();
}
}
From what I understand in question. A simple way can be:
BufferedReader buff = new BufferedReader(new FileReader(new File("H:\\Docs\\log.txt")));
Pair<Integer, BufferedWriter> ans = buff.lines().reduce(new Pair<Integer, BufferedWriter>(0, null), (count, line) -> {
try {
BufferedWriter w;
if (count.getKey() % 1000 == 0) {
if (count.getValue() != null) count.getValue().close();
w = new BufferedWriter(new FileWriter(new File("f" + count.getKey() + ".txt")));
} else w = count.getValue();
w.write(line + "\n"); //do something
return new Pair<>(count.getKey() + 1, w);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}, (x, y) -> {
throw new RuntimeException("Not supproted");
});
ans.getValue().close();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With