I have a method which accept file and size of chunks and return list of chunked files. But the main problem that my line in file could be broken, for example in main file I have next lines:
|1|aaa|bbb|ccc|
|2|ggg|ddd|eee|
After split I could have in one file:
|1|aaa|bbb
In another file:
|ccc|2|
|ggg|ddd|eee|
Here is the code:
public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
int counter = 1;
List<File> files = new ArrayList<>();
int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
byte[] buffer = new byte[sizeOfChunk];
try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file))) {
String name = file.getName();
int tmp = 0;
while ((tmp = bis.read(buffer)) > 0) {
File newFile = new File(file.getParent(), name + "."
+ String.format("%03d", counter++));
try (FileOutputStream out = new FileOutputStream(newFile)) {
out.write(buffer, 0, tmp);
}
files.add(newFile);
}
}
return files;
}
Should I use RandomAccessFile class for above purposes (main file is really big - more then 5 Gb)?
If you don't mind to have chunks of different lengths (<=sizeOfChunk but closest to it) then here is the code:
public static List<File> splitFile(File file, int sizeOfFileInMB) throws IOException {
int counter = 1;
List<File> files = new ArrayList<File>();
int sizeOfChunk = 1024 * 1024 * sizeOfFileInMB;
String eof = System.lineSeparator();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String name = file.getName();
String line = br.readLine();
while (line != null) {
File newFile = new File(file.getParent(), name + "."
+ String.format("%03d", counter++));
try (OutputStream out = new BufferedOutputStream(new FileOutputStream(newFile))) {
int fileSize = 0;
while (line != null) {
byte[] bytes = (line + eof).getBytes(Charset.defaultCharset());
if (fileSize + bytes.length > sizeOfChunk)
break;
out.write(bytes);
fileSize += bytes.length;
line = br.readLine();
}
}
files.add(newFile);
}
}
return files;
}
The only problem here is file charset which is default system charset in this example. If you want to be able to change it let me know. I'll add third parameter to "splitFile" function for it.
Just in case anyone is interested in a Kotlin version. It creates an iterator of ByteArray chunks:
class ByteArrayReader(val input: InputStream, val chunkSize: Int, val bufferSize: Int = 1024*8): Iterator<ByteArray> {
var eof: Boolean = false
init {
if ((chunkSize % bufferSize) != 0) {
throw RuntimeException("ChunkSize(${chunkSize}) should be a multiple of bufferSize (${bufferSize})")
}
}
override fun hasNext(): Boolean = !eof
override fun next(): ByteArray {
var buffer = ByteArray(bufferSize)
var chunkWriter = ByteArrayOutputStream(chunkSize) // no need to close - implementation is empty
var bytesRead = 0
var offset = 0
while (input.read(buffer).also { bytesRead = it } > 0) {
if (chunkWriter.use { out ->
out.write(buffer, 0, bytesRead)
out.flush()
offset += bytesRead
offset == chunkSize
}) {
return chunkWriter.toByteArray()
}
}
eof = true
return chunkWriter.toByteArray()
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With