Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

S3 Java client fails a lot with "Premature end of Content-Length delimited message body" or "java.net.SocketException Socket closed"

I have an application that does a lot work on S3, mostly downloading files from it. I am seeing a lot of these kind of errors and I'd like to know if this is something on my code or if the service is really unreliable like this.

The code I'm using to read from the S3 object stream is as follows:

public static final void write(InputStream stream, OutputStream output) {

  byte[] buffer = new byte[1024];

  int read = -1;

  try {

    while ((read = stream.read(buffer)) != -1) {
      output.write(buffer, 0, read);
    }

    stream.close();
    output.flush();
    output.close();
  } catch (IOException e) {
    throw new RuntimeException(e);
  }

}

This OutputStream is a new BufferedOutputStream( new FileOutputStream( file ) ). I am using the latest version of the Amazon S3 Java client and this call is retried four times before giving up. So, after trying this for 4 times it still fails.

Any hints or tips on how I could possibly improve this are appreciated.

like image 687
Maurício Linhares Avatar asked Mar 31 '12 03:03

Maurício Linhares


3 Answers

I just managed to overcome a very similar problem. In my case the exception I was getting was identical; it happened for larger files but not for small files, and it never happened at all while stepping through the debugger.

The root cause of the problem was that the AmazonS3Client object was getting garbage collected in the middle of the download, which caused the network connection to break. This happened because I was constructing a new AmazonS3Client object with every call to load a file, while the preferred use case is to create a long-lasting client object that survives across calls - or at least is guaranteed to be around during the entirety of the download. So, the simple remedy is to make sure a reference to the AmazonS3Client is kept around so that it doesn't get GC'd.

A link on the AWS forums that helped me is here: https://forums.aws.amazon.com/thread.jspa?threadID=83326

like image 73
Steve S. Avatar answered Nov 12 '22 18:11

Steve S.


The network is closing the connection, prior to the client getting all the data, for one reason or another, that's what is going on.

Part of any HTTP Request is the content length, Your code is getting the header, saying hey buddy, here's data, and its this much of it.. and then the connection is dropping before the client has read all of the data.. so its bombing out with the exception.

I'd look at your OS/NETWORK/JVM connection timeout settings (though JVM generally inherit from the OS in this situation). The key is to figure out what part of the network is causing the problem. Is it your computer level settings saying, nope not going to wait any longer for packets.. is it that you are using a non blocking read, which has a timeout setting in your code, where it is saying, hey, haven't gotten any data from the server since longer than I'm supposed to wait so I'm going to drop the connection and exception. etc etc etc.

Best bet is to low level snoop the packet traffic and trace backwards, to see where the connection drop is happening, or see if you can up timeouts in things you can control, like your software, and OS/JVM.

like image 33
Speckpgh Avatar answered Nov 12 '22 18:11

Speckpgh


First of all, your code is operating entirely normally if (and only if) you suffer connectivity troubles between yourself and Amazon S3. As Michael Slade points out, standard connection-level debugging advice applies.

As to your actual source code, I note a few code smells you should be aware of. Annotating them directly in the source:

public static final void write(InputStream stream, OutputStream output) {

  byte[] buffer = new byte[1024]; // !! Abstract 1024 into a constant to make 
                                  //  this easier to configure and understand.

  int read = -1;

  try {

    while ((read = stream.read(buffer)) != -1) {
      output.write(buffer, 0, read);
    }

    stream.close(); // !! Unexpected side effects: closing of your passed in 
                    //  InputStream. This may have unexpected results if your
                    //  stream type supports reset, and currently carries no 
                    //  visible documentation.

    output.flush(); // !! Violation of RAII. Refactor this into a finally block, 
    output.close(); //  a la Reference 1 (below).

  } catch (IOException e) {
    throw new RuntimeException(e); // !! Possibly indicative of an outer 
                                   //   try-catch block for RuntimeException. 
                                   //   Consider keeping this as IOException.
  }
}

(Reference 1)

Otherwise, the code itself seems fine. IO exceptions should be expected occurrences in situations where you're connecting to a fickle remote host, and your best course of action is to draft a sane policy to cache and reconnect in these scenarios.

like image 1
MrGomez Avatar answered Nov 12 '22 17:11

MrGomez