I am in the process of trying to make a decision on how, when and where to handle uploaded files from users. We are in a MicroService environment (PHP + Linux) for a new system to be deployed in the coming months. One key component is incoming files.
Currently as I see it There are 3 options (Maybe more that I am not yet aware of). They are as follows:
(1)
[CLIENT:file] ->
[GATEWAY API
FILE STORAGE HANDLER ->
[a: MICROSERVICE-News]
[b: MICROSERVICE-Authors]
[c: MICROSERVICE-Logger]
] -> {response}`
In this scenario, the Gateway API is designed to handle talking directly with a storage service (S3, GCS), sets a filename, validates, etc. When a storage confirmation is received it then passes that filename and other data to other MicroServices as needed. I see this as being overall beneficial as the file is handled as soon as it's received and can fail without impacting anything else further down the line. It does however add complexity to the Gateway and can potentially slow things down quickly at peak times.
(2)
[CLIENT:file] ->
[GATEWAY API
[a: MICROSERVICE-Files]
[b: MICROSERVICE-News]
[c: MICROSERVICE-Authors]
[d: MICROSERVICE-Logger]
] -> {response}
In this scenario, the file is received by the Gateway API and then has to pass it along to a files MicroService. This can be beneficial because it takes visibility away from the gateway and offers the flexibility of easily making changes inside of a service without impacting the gateway for instance. The major downside of this is that now a single file is being handled twice and will require computing additional resources.
(3)
[CLIENT:file] ->
[FILE API] -> {response} ->
[CLIENT] ->
[GATEWAY API
[a: MICROSERVICE-News]
[b: MICROSERVICE-Authors]
[c: MICROSERVICE-Logger]
] -> {response}
In this scenario, the Client is accountable for sending files to a separate service and using the response to send to the Gateway API. From a resource perspective this takes a huge load off of the Gateway API and allows it to only be concerned with data, never files. The major drawback of this is a Client can send faulty or malicious information to the Gateway API and would require additional validation to ensure the file is valid and exists. It also creates potential congruence problems in the future between services and clients.
I may be missing other options and would love to know if there are. Does anyone have experience with this and how did you solve or approach handling files in a MicroService architecture?
A basic principle of microservices is that each service manages its own data. Two services should not share a data store. Instead, each service is responsible for its own private data store, which other services cannot access directly.
File upload vulnerabilities are when a web server allows users to upload files to its filesystem without sufficiently validating things like their name, type, contents, or size.
I think to those architectural questions there is no one-fits-all solution, it always depends on your context and quality goals.
If you prefer the encapsulation over performance, then go with solution (2). You might want to consider using a client-based service discovery mechanism for the file service instead of a full-blown API-Gateway to reduce the load on the gateway.
If you prefer performance and the client is under your control, then you can go with solution (3).
I'd avoid solution (1), though. Microservices have a principle "keep the smarts in the endpoints", meaning to avoid putting logic into infrastructure components like API-Gateways.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With