Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with multiple database results from different servers for a request

I have cloud statistics (Structured data :: CSV) information; which i have to expose to administrator and user.

But for scalability; data collection will be collected by multiple machines (perf monitor) which is connected with individual DBs.

Now Manager (Mgr) is responsible of multicasting the request to all perf monitor; to collect the overall stats data to satisfy single UI request.

So questions are:

1) How will i make the mutiple monitor datas to be sorted based on the client request at Mgr. Each monitor may give the result as per the client request; but still how to merge multiple machines datas through java? Means How to perform in memory sql aggregate/scalar (e.g. Groupby, orderby, avg) function on all the results retrieved from multiple clusters at MGR. How do i implement DB sql aggregate/scalar functionality in java side, any known APIs? I think what i need is Reduce part of mapreduce technique in hadoop.

2) A request from UI (assume select count(*) from DB where Memory > 1000MB) have to be forwarded to multiple machines. Now how to send parallel requests to individual monitor and consume only when all the nodes are responded? Means how to wait User thread till consuming all the responses from perf monitors? How to trigger parallel REST request for single UI request on MGR.

3) Do I have to authenticate UI user at both Mgr and Perf monitor?

4) Are you thinking any drawback in this approach?

Notes:

1) I didn't go for NoSql because datas are structured and no joins are required.

2) I didn't go for node.js since i am new for that and may take more time on developing it. Also i am not developing any concurrent critical where single threaded are best suited. Here only push/retrieve of data is done. No modification happening.

3) I want individual DB for each monitor OR at-least two instances of DB's with multiple clusters for an instance to support faster accessing of real time BIG statistical data.

enter image description here

like image 438
Kanagavelu Sugumar Avatar asked Jun 30 '16 06:06

Kanagavelu Sugumar


People also ask

How can I query to retrieve data from multiple databases?

How can I query to retrieve data from multiple database servers. How can I run a query for many different databases in one query or in one time? Create and use “DB Links” in your SQL statement query. Toad for Oracle provides a GUI for creating DB Links. Please go to the Schema Browser | DB Links Tab | click on the ‘Create DB Link’ icon.

How to respond to multiple concurrent requests?

To respond to multiple concurrent requests you can have a thread pool or workers but how to maintain the consistency.Since the starting of this article i have been worried about maintaining consistency but what do i mean by consistency. Now suppose we sell iPhones on our e-commerce website and we have 10 pieces in our inventory.

What happens if you merge data from two different sources?

Data attributes in two different sources may conceptually represent the same information, but the format of their data values may be completely different. These structural and lexical differences in data may cause data loss and unfixable errors if the data is merged without being cleaned and standardized.

Why do we collect multiple data points per condition?

The multiple data points per condition are collected for two reasons. One is that the data should be more reliable this way. The other reason is, that some data points have to be discarded (subjects did not follow instruction correctly all times).


2 Answers

You want to scale your app, but you designed an inherent bottleneck. Namely: the Mgr.

What I would do is that I would split the Mgr into at least two parts. Front-end and backend. The front end could simply be an aggregator and/or controller which collects all the requests from all the different UI servers, timestamps those requests and put them in a queue (RabbitMQ, Kafka, Redis, whatever) making a message with the UI session ID or something similar which uniquely identifies the source of request. Then you just have to wait until you get a response on the queue (with a different topic of course).

Then on your backend (the other side of the queue) you can set up as many nodes as your load requires and make them performing the same task. Namely: pull off requests from the queue and call those performance monitoring APIs as necessary. You can scale these backend nodes as much as you wish since they don't have any state, all the state which needs to be stored is already part of the messages in the queue which will be automagically persisted for you by Redis/Kafka/RabbitMQ or whatever else you choose.

You can also use Apache Storm or something similar to do this for you in the backend, since it was designed for exactly this kind of applications.

Apache Storm has also built-in merging capability exposed through the Trident API.

Note on the authentication: you should authenticate the HTTP requests on the front-end side and then you will be all right. Just assign unique IDs (session IDs most probably) to the users connected to your mgr and use this internal ID when you forward your requests further to downstream servers.

Now how to send parallel requests to individual monitor and consume only when all the nodes are responded? Means how to wait User thread till consuming all the responses from perf monitors? How to trigger parallel REST request for single UI request on MGR.

Well if you have so many questions regarding handling user connections and serving those clients with responses then I would suggest to pick up a book on the Java servlets API. You might want to read this one for example: Servlet & JSP: A Tutorial (A Tutorial series). It is a bit outdated but well written.

But with all due respect, if you have so many questions on these quite fundamental topics, then it might be better to leave the architecture design to someone more experienced.

like image 199
Alma Alma Avatar answered Oct 19 '22 20:10

Alma Alma


Don't reinvent the wheel, use some good existing BAM and Database monitoring tools, they have lot of built in dashboards and statistics, easy to connect with Java and work-flows.

like image 42
Amit Mahajan Avatar answered Oct 19 '22 20:10

Amit Mahajan