Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good way to send large data sets to a client through API requests?

A client's system will connect to our system via an API for a data pull. For now this data will be stored in a data mart, and say 50,000 records per request.

I would like to know the most efficient way of delivering the payload which originates in a SQL Azure database.

The API request will be a RESTful. After the request is received, I was thinking that the payload would be retrieved from the database, converted to JSON, and GZIP encoded/transferred over HTTP back to the client.

I'm concerned about processing this may take with many clients connected pulling a lot of data.

Would it be best to just return the straight results in clear text to the client?

Suggestions welcome.

-- UPDATE --

To clarify, this is not a web client that is connecting. The connection is made by another application to receive a one-time, daily data dump, so no pagination.

The data consists primarily of text with one binary field.

like image 870
ElHaix Avatar asked Feb 13 '23 15:02

ElHaix


1 Answers

First of all : do not optimize prematurely! that means : dont sacrifice simplicity and maintainability of your code for gain you dont event know.

Lets see. 50000 records does not really say anything without specifying size of the record. I would advise you start from basic implementation and optimize when needed. So try this

  1. Implement simple JSON response with that 50000 records, and try to call it from consumer app. Measure size of data and response time - evaluate carefully, if this is really a problem for once a day operation

  2. If yes, turn on compression for that JSON response - this is usually HUGE change with JSON because of lots of repetitive text. One tip here: set content type header to "application/javascript" - Azure have dynamic compression enabled by default for this content type. Again - try it, evaluate if size of data or reponse time is problem

  3. If it is still problem, maybe it is time for some serialization optimization after all, but i would strogly recommend something standard and proved here (no custom CSV mess), for example Google Protocol Buffers : https://code.google.com/p/protobuf-net/

like image 150
rouen Avatar answered Feb 15 '23 06:02

rouen