Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ways to buffer REST response

There's a REST endpoint, which serves large (tens of gigabytes) chunks of data to my application.
Application processes the data in it's own pace, and as incoming data volumes grow, I'm starting to hit REST endpoint timeout.
Meaning, processing speed is less then network throughoutput.
Unfortunately, there's no way to raise processing speed enough, as there's no "enough" - incoming data volumes may grow indefinitely.

I'm thinking of a way to store incoming data locally before processing, in order to release REST endpoint connection before timeout occurs.

What I've came up so far, is downloading incoming data to a temporary file and reading (processing) said file simultaneously using OutputStream/InputStream.
Sort of buffering, using a file.

This brings it's own problems:

  • what if processing speed becomes faster then downloading speed for some time and I get EOF?
  • file parser operates with ObjectInputStream and it behaves weird in cases of empty file/EOF
  • and so on

Are there conventional ways to do such a thing?
Are there alternative solutions?
Please provide some guidance.

Upd:

I'd like to point out: http server is out of my control.
Consider it to be a vendor data provider. They have many consumers and refuse to alter anything for just one.
Looks like we're the only ones to use all of their data, as our client app processing speed is far greater than their sample client performance metrics. Still, we can not match our app performance with network throughoutput.

Server does not support http range requests or pagination.
There's no way to divide data in chunks to load, as there's no filtering attribute to guarantee that every chunk will be small enough.

Shortly: we can download all the data in a given time before timeout occurs, but can not process it.
Having an adapter between inputstream and outpustream, to pefrorm as a blocking queue, will help a ton.

like image 389
miracle_the_V Avatar asked Apr 02 '18 15:04

miracle_the_V


People also ask

When should I use restful API?

The most common scenario of using REST APIs is to deliver static resource representations in XML or JSON. However, this architectural style allows users to download and run code in the form of Java applets or scripts (such as JavaScript).

What are REST endpoints?

A REST Service Endpoint is an endpoint which services a set of REST resources.


1 Answers

You're using something like new ObjectInputStream(new FileInputStream(..._) and the solution for EOF could be wrapping the FileInputStream first in an WriterAwareStream which would block when hitting EOF as long a the writer is writing.

Anyway, in case latency don't matter much, I would not bother start processing before the download finished. Oftentimes, there isn't much you can do with an incomplete list of objects.

Maybe some memory-mapped-file-based queue like Chronicle-Queue may help you. It's faster than dealing with files directly and may be even simpler to use.


You could also implement a HugeBufferingInputStream internally using a queue, which reads from its input stream, and, in case it has a lot of data, it spits them out to disk. This may be a nice abstraction, completely hiding the buffering.

There's also FileBackedOutputStream in Guava, automatically switching from using memory to using a file when getting big, but I'm afraid, it's optimized for small sizes (with tens of gigabytes expected, there's no point of trying to use memory).

like image 179
maaartinus Avatar answered Sep 22 '22 11:09

maaartinus