Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store an Http Response that may contain binary data?

As I described in a previous question, I have an assignment to write a proxy server. It partially works now, but I still have a problem with handling of gzipped information. I store the HttpResponse in a String, and it appears I can't do that with gzipped content. However, the headers are text which I need to parse, and they all come from the same InputStream. My question is, what do I have to do in order to correctly handle binary responses, while still parsing the headers as strings?

>> Please see the edit below before you look at the code.

Here's the Response class implementation:

public class Response {
    private String fullResponse = "";
    private BufferedReader reader;
    private boolean busy = true;
    private int responseCode;
    private CacheControl cacheControl;

    public Response(String input) {
        this(new ByteArrayInputStream(input.getBytes()));
    }

    public Response(InputStream input) {
        reader = new BufferedReader(new InputStreamReader(input));
        try {
            while (!reader.ready());//wait for initialization.

            String line;
            while ((line = reader.readLine()) != null) {
                fullResponse += "\r\n" + line;

                if (HttpPatterns.RESPONSE_CODE.matches(line)) {
                    responseCode = (Integer) HttpPatterns.RESPONSE_CODE.process(line);
                } else if (HttpPatterns.CACHE_CONTROL.matches(line)) {
                    cacheControl = (CacheControl) HttpPatterns.CACHE_CONTROL.process(line);
                }
            }
            reader.close();
            fullResponse = "\r\n" + fullResponse.trim() + "\r\n\r\n";
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
        busy = false;
    }

    public CacheControl getCacheControl() {
        return cacheControl;
    }

    public String getFullResponse() {
        return fullResponse;
    }

    public boolean isBusy() {
        return busy;
    }

    public int getResponseCode() {
        return responseCode;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result
                + ((fullResponse == null) ? 0 : fullResponse.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (!(obj instanceof Response))
            return false;
        Response other = (Response) obj;
        if (fullResponse == null) {
            if (other.fullResponse != null)
                return false;
        } else if (!fullResponse.equals(other.fullResponse))
            return false;
        return true;
    }

    @Override
    public String toString() {
        return "Response\n==============================\n" + fullResponse;
    }
}

And here's HttpPatterns:

public enum HttpPatterns {
    RESPONSE_CODE("^HTTP/1\\.1 (\\d+) .*$"),
    CACHE_CONTROL("^Cache-Control: (\\w+)$"),
    HOST("^Host: (\\w+)$"),
    REQUEST_HEADER("(GET|POST) ([^\\s]+) ([^\\s]+)$"),
    ACCEPT_ENCODING("^Accept-Encoding: .*$");

    private final Pattern pattern;

    HttpPatterns(String regex) {
        pattern = Pattern.compile(regex);
    }

    public boolean matches(String expression) {
        return pattern.matcher(expression).matches();
    }

    public Object process(String expression) {
        Matcher matcher = pattern.matcher(expression);
        if (!matcher.matches()) {
            throw new RuntimeException("Called `process`, but the expression doesn't match. Call `matches` first.");
        }

        if (this == RESPONSE_CODE) {
            return Integer.parseInt(matcher.group(1));
        } else if (this == CACHE_CONTROL) {
            return CacheControl.parseString(matcher.group(1));
        } else if (this == HOST) {
            return matcher.group(1);
        } else if (this == REQUEST_HEADER) {
            return new RequestHeader(RequestType.parseString(matcher.group(1)), matcher.group(2), matcher.group(3));
        } else { //never happens
            return null;
        }
    }


}

EDIT

I tried implementing according the suggestions, but it's not working and I'm becoming desperate. When I try to view an image I get the following message from the browser:

The image “http://www.google.com/images/logos/ps_logo2.png” cannot be displayed because it contains errors.

Here's the log:

Request
==============================

GET http://www.google.com/images/logos/ps_logo2.png HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Cookie: PREF=ID=31f95dd7f42dfc7d:TM=1303507626:LM=1303507626:S=D4kIZ6rGFrlOUWlm


Not Reading from the Cache!!!!
I am going to try to connect to: www.google.com at port 80
Connected.
Writing to the server's buffer...
flushed.
Getting a response...
Got a binary response!


contentLength = 26209; headers.length() = 312; responseLength = 12136; fullResponse length = 12136


Got a response!

Writing to the Cache!!!!
I am going to write the following response:

HTTP/1.1 200 OK
Content-Type: image/png
Last-Modified: Thu, 05 Aug 2010 22:54:44 GMT
Date: Wed, 04 May 2011 15:05:30 GMT
Expires: Wed, 04 May 2011 15:05:30 GMT
Cache-Control: private, max-age=31536000
X-Content-Type-Options: nosniff
Server: sffe
Content-Length: 26209
X-XSS-Protection: 1; mode=block

 Response body is binary and was truncated.
Finished with request!

Here's the new Response class:

public class Response {
    private String headers = "";
    private BufferedReader reader;
    private boolean busy = true;
    private int responseCode;
    private CacheControl cacheControl;
    private InputStream fullResponse;
    private ContentEncoding encoding = ContentEncoding.TEXT;
    private ContentType contentType = ContentType.TEXT;
    private int contentLength;

    public Response(String input) {
        this(new ByteArrayInputStream(input.getBytes()));
    }

    public Response(InputStream input) {

        ByteArrayOutputStream tempStream = new ByteArrayOutputStream();
        InputStreamReader inputReader = new InputStreamReader(input);
        try {
            while (!inputReader.ready());
            int responseLength = 0;
            while (inputReader.ready()) {
                tempStream.write(inputReader.read());
                responseLength++;
            }
            /*
             * Read the headers
             */
            reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(tempStream.toByteArray())));
            while (!reader.ready());//wait for initialization.

            String line;
            while ((line = reader.readLine()) != null) {
                headers += "\r\n" + line;

                if (HttpPatterns.RESPONSE_CODE.matches(line)) {
                    responseCode = (Integer) HttpPatterns.RESPONSE_CODE.process(line);
                } else if (HttpPatterns.CACHE_CONTROL.matches(line)) {
                    cacheControl = (CacheControl) HttpPatterns.CACHE_CONTROL.process(line);
                } else if (HttpPatterns.CONTENT_ENCODING.matches(line)) {
                    encoding = (ContentEncoding) HttpPatterns.CONTENT_ENCODING.process(line);
                } else if (HttpPatterns.CONTENT_TYPE.matches(line)) {
                    contentType = (ContentType) HttpPatterns.CONTENT_TYPE.process(line);
                } else if (HttpPatterns.CONTENT_LENGTH.matches(line)) {
                    contentLength = (Integer) HttpPatterns.CONTENT_LENGTH.process(line);
                } else if (line.isEmpty()) {
                    break;
                }
            }

            InputStreamReader streamReader = new InputStreamReader(new ByteArrayInputStream(tempStream.toByteArray()));
            while (!reader.ready());//wait for initialization.
            //Now let's get the rest
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            int counter = 0;
            while (streamReader.ready() && counter < (responseLength - contentLength)) {
                out.write((char) streamReader.read());
                counter++;
            }
            if (encoding == ContentEncoding.BINARY || contentType == ContentType.BINARY) {
                System.out.println("Got a binary response!");
                while (streamReader.ready()) {
                    out.write(streamReader.read());
                }
            } else {
                System.out.println("Got a text response!");
                while (streamReader.ready()) {
                    out.write((char) streamReader.read());
                }
            }
            fullResponse = new ByteArrayInputStream(out.toByteArray());

            System.out.println("\n\ncontentLength = " + contentLength + 
                    "; headers.length() = " + headers.length() + 
                    "; responseLength = " + responseLength + 
                    "; fullResponse length = " + out.toByteArray().length + "\n\n");

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
        busy = false;
    }

}

and here's the ProxyServer class:

class ProxyServer {
    public void start() {
        while (true) {
            Socket serverSocket;
            Socket clientSocket;
            OutputStreamWriter toClient;
            BufferedWriter toServer;
            try {
                //The client is meant to put data on the port, read the socket.
                clientSocket = listeningSocket.accept();
                Request request = new Request(clientSocket.getInputStream());
                //System.out.println("Accepted a request!\n" + request);
                while(request.busy);
                //Make a connection to a real proxy.
                //Host & Port - should be read from the request
                URL url = null;
                try {
                    url = new URL(request.getRequestURL());
                } catch (MalformedURLException e){
                    url = new URL("http:\\"+request.getRequestHost()+request.getRequestURL());
                }

                System.out.println(request);

                //remove entry from cache if needed
                if (!request.getCacheControl().equals(CacheControl.CACHE) && cache.containsRequest(request)) {
                    cache.remove(request);
                }

                Response response = null;

                if (request.getRequestType() == RequestType.GET && request.getCacheControl().equals(CacheControl.CACHE) && cache.containsRequest(request)) {
                    System.out.println("Reading from the Cache!!!!");
                    response = cache.get(request);
                } else {
                    System.out.println("Not Reading from the Cache!!!!");
                    //Get the response from the destination
                    int remotePort = (url.getPort() == -1) ? 80 : url.getPort();
                    System.out.println("I am going to try to connect to: " + url.getHost() + " at port " + remotePort);
                    serverSocket = new Socket(url.getHost(), remotePort);
                    System.out.println("Connected.");
                    serverSocket.setSoTimeout(50000);

                    //write to the server - keep it open.
                    System.out.println("Writing to the server's buffer...");
                    toServer = new BufferedWriter(new OutputStreamWriter(serverSocket.getOutputStream()));
                    toServer.write(request.getFullRequest());
                    toServer.flush();
                    System.out.println("flushed.");

                    System.out.println("Getting a response...");
                    response = new Response(serverSocket.getInputStream());
                    //System.out.println("Got a response!\n" + response);
                    System.out.println("Got a response!\n");
                    //wait for the response
                    while(response.isBusy());
                }

                if (request.getRequestType() == RequestType.GET && request.getCacheControl().equals(CacheControl.CACHE) && response.getResponseCode() == 200) {
                    System.out.println("Writing to the Cache!!!!");
                    cache.put(request, response);
                }
                else System.out.println("Not Writing to the Cache!!!!");
                response = filter.filter(response);

                // Return the response to the client
                toClient = new OutputStreamWriter(clientSocket.getOutputStream());
                System.out.println("I am going to write the following response:\n" + response);
                BufferedReader responseReader = new BufferedReader(new InputStreamReader(response.getFullResponse()));
                while (responseReader.ready()) {
                    toClient.write(responseReader.read());
                }
                toClient.flush();
                toClient.close();
                clientSocket.close();
                System.out.println("Finished with request!");

            } catch (IOException e) {
                e.printStackTrace();
                continue;
            }
        }
   }
}

I would appreciate any and all feedback/insight/suggestion regarding how to solve this, and would of course prefer some actual code.

like image 379
Amir Rachum Avatar asked Apr 25 '11 10:04

Amir Rachum


People also ask

Can you send binary data over HTTP?

HTTP is perfectly capable of handling binary data: images are sent over HTTP all the time, and they're binary. People upload and download files of arbitrary data types all the time with no problem.

Which data type are used for storing data in binary format?

Store raw-byte data, such as IP addresses, up to 65000 bytes. Data types BINARY and BINARY VARYING ( VARBINARY ) are collectively referred to as binary string types and the values of binary string types are referred to as binary strings. A binary string is a sequence of octets or bytes.

Can we send binary data in JSON?

The JSON format natively doesn't support binary data. The binary data has to be escaped so that it can be placed into a string element (i.e. zero or more Unicode chars in double quotes using backslash escapes) in JSON.

Is binary data allowed in GET method?

When a GET method returns binary data, the Swagger document that is generated by IBM® Cúram Social Program Management specifies the Response Content type as anything other than application/json. This response indicates that binary content and not JSON content is provided in the response.


2 Answers

Store it in a byte array:

byte[] bufer = new byte[???];

A more detailed process:

  • Create a buffer large enough for the response header (and drop exception if it is bigger).
  • Read bytes to the buffer until you find \r\n\r\n in the buffer. You can write a helper function for example static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle)
  • When you encounter the end of header, create a strinform the first n bytes of the buffer. You can then use RegEx on this strng (also note that RegEx is not the best method to parse HTTPeaders).
  • Be prepared that the buffer will contain additional data after the header, which are the first bytes of the response body. You have to copy these bytes to the output stream or output file or output buffer.
  • Read the rest of the response body. (Until content-length is read or stream is closed).

Edit:

You are not following these steps I suggested. inputReader.ready() is a wrong way to detect the phases of the response. There is no guarantee that the header will be sent in a single burst.

I tried to write a schematics in code (except the arrayIndexOf) function.

InputStream is;

// Create a buffer large enough for the response header (and drop exception if it is bigger).
byte[] headEnd = {13, 10, 13, 10}; // \r \n \r \n
byte[] buffer = new byte[10 * 1024];
int length = 0;

// Read bytes to the buffer until you find `\r\n\r\n` in the buffer. 
int bytes = 0;
int pos;
while ((pos = arrayIndexOf(buffer, 0, length, headEnd)) == -1 && (bytes = is.read(buffer, length, buffer.length() - length)) > -1) {
    length += bytes;

    // buffer is full but have not found end siganture
    if (length == buffer.length())
        throw new RuntimeException("Response header too long");
}

// pos contains the starting index of the end signature (\r\n\r\n) so we add 4 bytes
pos += 4;

// When you encounter the end of header, create a strinform the first *n* bytes
String header = new String(buffer, 0, pos);

System.out.println(header);

// Be prepared that the buffer will contain additional data after the header
// ... so we process it
System.out.write(buffer, pos, length - pos);

// process the rest until connection is closed
while (bytes = is.read(buffer, 0, bufer.length())) {
    System.out.write(buffer, 0, bytes);
}

The arrayIndexOf method could look something like this: (there are probably faster versions)

public static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle) {
    for (int i=offset; i<offset+length-nedle.length(); i++) {
        boolean match = false;
        for (int j=0; j<needle.length(); j++) {
            match = haystack[i + j] == needle[j];
            if (!match)
                break;
        }
        if (match)
            return i;
    }
    return -1;
}
like image 129
vbence Avatar answered Oct 15 '22 05:10

vbence


You basically need to parse the response headers as text, and the rest as binary. It's slightly tricky to do so, as you can't just create an InputStreamReader around the stream - that will read more data than you want. You'll quite possibly need to read data into a byte array and then call Encoding.GetString manually. Alternatively, if you've read data into a byte array already you could always create a ByteArrayInputStream around that, then an InputStreamReader on top... but you'll need to work out how far the headers go before you get to the body of the response, which you should keep as binary data.

like image 41
Jon Skeet Avatar answered Oct 15 '22 05:10

Jon Skeet