What I would like to do is shuffle the rows (read from CSV), then print out the first randomized 10,000 rows to one csv and the remainder to a separate csv. With a smaller file I can do something like
java.util.Collections.shuffle(...)
for (int i=0; i < 10000; i++) printcsv(...)
for (int i=10000; i < data.length; i++) printcsv(...)
However with very large files I now get OutOfMemoryError
You could:
Use more memory or
Shuffle not the actual CSV rows, but only a collection of row numbers, and then read the input file line-by-line (buffered, of course) and write the line to one of the desired output files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With