I am using java pipeline to pass the data (outstream) from an unzip module (JavaUncompress class) to a parsing module (handler class), the file is large, I want to unzip the file first and parse directly instead of saving the unzipped file and then parse. However, it only works for file of small size. When I input an 1G file, it seems only part of the file (say 50000 lines) are piplined from the outstream to the inputstream of the parsing module.
I tried to use a String to save the uncompressed file, and the same thing happened, the String only contains part of the unzipped file (stopped at the same 50000th line as the piplined version). Is there any idea about what happened? Thank you very much.
{
PipedInputStream in = new PipedInputStream(); // to output
final PipedOutputStream out = new PipedOutputStream(in); // out is something from other
new Thread(
new Runnable(){
public void run(){
JavaUncompress.putDataOnOutputStream(inFile,out); }
}
).start();
doc = handler.processDataFromInputStream(in);
}
public static void putDataOnOutputStream(String inZipFileName, PipedOutputStream out){
try {
FileInputStream fis = new FileInputStream(inZipFileName);
//FilterInputStream ftis = new FilterInputStream;
ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) {
System.out.println("Extracting: " +entry);
byte data[] = new byte[BUFFER];
long len = entry.getSize();
long blk = len/BUFFER;
int rem = (int)(len - blk*BUFFER);
System.out.println(len+" = "+blk +"*BUFFER + "+rem);
for(long i=0; i!=blk; ++i){
if ((zis.read(data, 0, BUFFER)) != -1) {
out.write(data);
}
}
byte dataRem[] = new byte[rem];
if ((zis.read(dataRem, 0, rem)) != -1) {
out.write(dataRem);
out.flush();
out.close();
}
}
zis.close();
} catch(Exception e) {
e.printStackTrace();
}
}
PipedOutputStream.write() will block if the corresponding PipedInputStream gets more than 4096 or whatever bytes behind it, but why do this at all? Why not just unzip the file and process it in the same thread? There's no advantage to multi-threading it, it's just a pointless complication.
I've used pipes exactly once in 15 years in Java and I pretty quickly changed it to a queue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With