Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pros and cons - running (scheduled) background tasks and web requests handling on same server [closed]

What are the different pros and cons for running (scheduled) background tasks and handling web requests on the same (Java) server?

Some points to consider I thought about:

  • How the Garbage Collector would operate
  • Data isolation
  • CPU / memory usage
  • Traffic surges
  • Security
like image 235
Johnny Avatar asked Feb 06 '23 12:02

Johnny


2 Answers

tl;dr

pros

  1. convenience: less machines to operate
  2. cost: sharing infrastructure will keep cost under control

cons

  1. complexity: might not be obvious or simple to manage different apps on the same server
  2. scalability: scaling the API is simpler than scaling batch jobs
  3. availability: if you introduce HA then you will have to implement some sort of locking mechanism to avoid batch job concurrency issues
  4. security: will increase attack surface

Details

Resource usage patterns are very different between online, short-lived request-response operations such as an API and scheduled background tasks such as maintenance jobs or data curating operations.

For this reason it is generally a good idea to isolate such tasks at the lowest possible level running them on different VMs or even physical machines.

Garbage Collector perspective

If the same JVM instance is being used then batch jobs that consume lots of memory and then release it will cause garbage collection to pause the execution of the online requests degrading response time.

This can be mitigated by running each type of operation on its own JVM to minimize the effect of the stop the world effect.

Data Isolation

If the background tasks operate on the same data as the online requests then you can probably reuse at least some data access and mapping layer code potentially saving some work at the expense of introducing coupling at the data layer.

CPU / Memory usage

If the batch job is CPU or memory bound then it will affect the performance of the online requests unless you set some limits on the CPU / memory share that each process can use. This can be done at the VM, container or process level.

Traffic surges

If the batch job use a lot of bandwidth then it will affect online requests ability to send content to the clients. As with CPU and memory, bandwidth can also be throttled per application to mitigate this effect.

Security

Traditionally client facing applications such as APIs should be properly audited and hardened to avoid vulnerabilities. Scheduled background batch proceses usually have a smaller attack surface as they don't need to be exposed to clients and are therefore more difficult to compromise.

Depending on the nature of the deployment of course, sharing the same infrastructure for both applications will increase the risk for the batch jobs.

like image 151
cjungel Avatar answered Apr 07 '23 01:04

cjungel


To me, it really all depends on the kind of server you have and the processes that are executed in the background.

Data Isolation would be the least of my concerns. If your different services operates on the same data, you'll have conflicts no matter what even if they are not on the same server, and you'll have to look into Paxos for a consensus. You can look into Raft for this, which is really well explained and have a few implementations already.

If security is a concern for you, the most important part you need to protect is obviously your data, stored in whatever database you currently have. The server hosting your database should be isolated from the one(s) running the web and background services, since it will force the attacker to compromise the exposed services on your first layer rather than being able to target your DB directly. See the recent attacks onto MongoDB on exposed servers.

Which brings me back to my first point resolving around CPU, Ram and surges. These points could only be answered by knowing the budget at your disposal, the traffic you expect and the kind of jobs you are running.

However, if you do want to host them on the same server and you are flexible about it, you could prevent any downtime/slowdown for your users during a peak of traffic by pausing, or allocating less ressources to your background tasks, and resuming them once it stabilized. You can also think about analyzing the patterns of your traffic every hour and running the jobs only when you know they're won't be any issue at this time.

like image 37
Preview Avatar answered Apr 07 '23 01:04

Preview