What are the different pros and cons for running (scheduled) background tasks and handling web requests on the same (Java) server?
Some points to consider I thought about:
Resource usage patterns are very different between online, short-lived request-response operations such as an API and scheduled background tasks such as maintenance jobs or data curating operations.
For this reason it is generally a good idea to isolate such tasks at the lowest possible level running them on different VMs or even physical machines.
If the same JVM instance is being used then batch jobs that consume lots of memory and then release it will cause garbage collection to pause the execution of the online requests degrading response time.
This can be mitigated by running each type of operation on its own JVM to minimize the effect of the stop the world effect.
If the background tasks operate on the same data as the online requests then you can probably reuse at least some data access and mapping layer code potentially saving some work at the expense of introducing coupling at the data layer.
If the batch job is CPU or memory bound then it will affect the performance of the online requests unless you set some limits on the CPU / memory share that each process can use. This can be done at the VM, container or process level.
If the batch job use a lot of bandwidth then it will affect online requests ability to send content to the clients. As with CPU and memory, bandwidth can also be throttled per application to mitigate this effect.
Traditionally client facing applications such as APIs should be properly audited and hardened to avoid vulnerabilities. Scheduled background batch proceses usually have a smaller attack surface as they don't need to be exposed to clients and are therefore more difficult to compromise.
Depending on the nature of the deployment of course, sharing the same infrastructure for both applications will increase the risk for the batch jobs.
To me, it really all depends on the kind of server you have and the processes that are executed in the background.
Data Isolation would be the least of my concerns. If your different services operates on the same data, you'll have conflicts no matter what even if they are not on the same server, and you'll have to look into Paxos for a consensus. You can look into Raft for this, which is really well explained and have a few implementations already.
If security is a concern for you, the most important part you need to protect is obviously your data, stored in whatever database you currently have. The server hosting your database should be isolated from the one(s) running the web and background services, since it will force the attacker to compromise the exposed services on your first layer rather than being able to target your DB directly. See the recent attacks onto MongoDB on exposed servers.
Which brings me back to my first point resolving around CPU, Ram and surges. These points could only be answered by knowing the budget at your disposal, the traffic you expect and the kind of jobs you are running.
However, if you do want to host them on the same server and you are flexible about it, you could prevent any downtime/slowdown for your users during a peak of traffic by pausing, or allocating less ressources to your background tasks, and resuming them once it stabilized. You can also think about analyzing the patterns of your traffic every hour and running the jobs only when you know they're won't be any issue at this time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With