What would be the best approach to reverse a large text file that is uploaded asynchronously to a servlet that reverses this file in a scalable and efficient way?
I was thinking of using Java NIO to treat file as an array on disk (so that I don't have to treat the file as a string buffer in memory). Also, I am thinking of using MapReduce to break up the file and process it in separate machines.
If it is uploaded to you and you can get the length at the beginning, you could just create an empty full-sized file up front and write to it starting from the back and working your way to the front using seek
You'd probably want to define a block size (like 1K?) and reverse that much in memory before writing it out to the file.
That's a pretty tough task. If you can ensure that the HTTP Content-Length
and Content-Type
headers are present in the upload request (or in the multipart body when it's a multipart/form-data
request), then it would be an easy job with help of RandomAccessFile
. The content length is mandatory so that the RandomAccessFile
knows how long the file will be and write the character at the position you want it to be. The character encoding (which is usually present as an attribute of the content type header) is mandatory to know how many bytes a character will take into account (because RandomAccessFile
is byte based and for example UTF-8 encoding is variable-byte-length).
Here's a kickoff example (leaving obvious exception handling aside):
package com.stackoverflow.q2725897;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.RandomAccessFile;
import java.io.Reader;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
public class Test {
public static void main(String... args) throws Exception {
// Stub input. You need to gather it yourself from your sources.
File file = new File("/file.txt");
long length = file.length(); // Get it from HTTP request header using file upload API in question (Commons FileUpload?).
String encoding = "UTF-8"; // Get it from HTTP request header using file upload API in question (Commons FileUpload?).
InputStream content = new FileInputStream(file); // Get it from HTTP request body using file upload API in question (Commons FileUpload?).
// Now the real job.
Reader input = new InputStreamReader(content, encoding);
RandomAccessFile output = new RandomAccessFile(new File("/filereversed.txt"), "rwd");
CharsetEncoder encoder = Charset.forName(encoding).newEncoder();
for (int data; (data = input.read()) != -1;) {
ByteBuffer bytes = encoder.encode(CharBuffer.wrap(new char[] { (char) data }));
length -= bytes.limit();
output.seek(length);
output.write(bytes.array());
}
// Should actually be done in finally.
input.close();
output.close();
}
}
If those headers are not present (especially Content-length
is important), then you'll really need to store it on disk first until end of stream and then re-read and reverse it the same way with help of RandomAccessFile
.
Update: it would actually be tougher than it look like. Is the character encoding of the input always guaranteed to be the same? If so, what one would it be? Also, what would you like to do with for example surrogate characters and newlines? The above example doesn't take that into account correctly. But it at least gives the base idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With