I have a 35GB CSV
file. I want to read each line, and write the line out to a new CSV if it matches a condition.
try (BufferedWriter writer = Files.newBufferedWriter(Paths.get("source.csv"))) {
try (BufferedReader br = Files.newBufferedReader(Paths.get("target.csv"))) {
br.lines().parallel()
.filter(line -> StringUtils.isNotBlank(line)) //bit more complex in real world
.forEach(line -> {
writer.write(line + "\n");
});
}
}
This takes approx. 7 minutes. Is it possible to speed up that process even more?
Hold Ctrl and click multiple files to select them all, no matter where they are on the page. To select multiple files in a row, click the first one, then hold Shift while you click the last one. This lets you easily pick a large number of files to copy or cut.
MyAirBridge. With MyAirBridge(Opens in a new window), you can upload a file and email a link to a specific recipient or just upload the file and generate a link to share with anyone. You can send a file as large as 20GB for free.
If it is an option you could use GZipInputStream/GZipOutputStream to minimize disk I/O.
Files.newBufferedReader/Writer use a default buffer size, 8 KB I believe. You might try a larger buffer.
Converting to String, Unicode, slows down to (and uses twice the memory). The used UTF-8 is not as simple as StandardCharsets.ISO_8859_1.
Best would be if you can work with bytes for the most part and only for specific CSV fields convert them to String.
A memory mapped file might be the most appropriate. Parallelism might be used by file ranges, spitting up the file.
try (FileChannel sourceChannel = new RandomAccessFile("source.csv","r").getChannel(); ...
MappedByteBuffer buf = sourceChannel.map(...);
This will become a bit much code, getting lines right on (byte)'\n'
, but not overly complex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With