In my phone interview at one of the financial firms as an software architect, "design a cloud storage system like AWS S3".
, I listed requirements - CRUD Microservices on objects - Caching layer to improve performance - Deployment on PaaS - resiliency with failover - AAA support ( authorization, auditing, accounting/billing) - Administration microservices (user, project, lifecycle of object, SLA dashboard) - Metrics collection (Ops, Dev) - Security for service endpoints for admin UI
I defined basic APIs.
https://api.service.com/services/get Arugments object id, metadata return binary object
https://api.service.com/services/upload Arguments object returns object id
https://api.service.com/services/delete Arugments object id returns success/error
http://api.service.com/service/update-meta Arugments object id, metadata return success/error
I drew the picture on board with architecture and some COTS components i can use. below is the picture.
Interviewer did not ask me much questions, and hence I am bit worried that if I am on right track with my process. Pl provide your feedback..
Thanks in advance..
S3 is a distributed file storage facilitating storage of blob data. In simpler terms, it could be described as the folder on the cloud.
Amazon S3 is an object storage service that stores data as objects within buckets. An object is a file and any metadata that describes the file. A bucket is a container for objects. To store your data in Amazon S3, you first create a bucket and specify a bucket name and AWS Region.
With Amazon S3, you can store data across a range of different S3 storage classes purpose-built for specific use cases and access patterns: S3 Intelligent-Tiering, S3 Standard, S3 Standard-Infrequent Access (S3 Standard-IA), S3 One Zone-Infrequent Access (S3 One Zone-IA), S3 Glacier Instant Retrieval, S3 Glacier ...
There are a couple of areas of feedback that might be helpful:
The S3 API is a RESTful API these days (it used to support SOAP) and it represents each 'file' (really a blob of data indexed by a key) as an HTTP resource, where the key is the path in the resource's URI. Your API is more RPC, in that each HTTP resource represents an operation to be carried out and the key to the blob is one of the parameters.
Whether or not this is a good or bad thing depends on what you're trying to achieve and what architectural style you want to adopt (although I am a fan of REST, it doesn't mean you have to adopt it for all applications), however since you were asked to design a system like S3, your answer would have benefited from a clear argument as to why you chose NOT to use REST as S3 does.
Architecture diagrams tend to often be very high level - which is appropriate - but there is a tendency sometimes to just draw lines between boxes without being clear about what those lines mean. Does it mean there is a network connection between the infrastructure hosting those software components? Does it mean there is an information or data flow between those components?
When you a draw a line like in your diagram that has multiple boxes all joining together on the line, the implication is that there is some relationship between the boxes. When you add arrows, there is the further implication that the relationship follows the direction of the arrows. But there is no clarity about what that relationship is, or why the directionality is important.
One could infer from your diagram that the Memcache Cluster and the File Storage cluster are both sending data to the Metrics/SLA portal, but that they are not sending data to each other. Or that the ELB is not connected to the microservices. Clearly that is not the case.
Your diagram includes a Load Balancer (more physical) and also a separate box per microservice (could be physical or logical or software), where each microservice is responsible for a different type of operation. It is not clear if each microservice has it's own load balancer, or if the load balancer is a layer 7 balancer that can map paths to different front ends.
While architectures often focus on the internal structure of a system, it is also important to consider the system context - i.e. what are the important elements outside the system that the system needs to interract with? e.g. what are the expected clients and their methods of connectivity?
While the above feedback focussed on the method of communicating your, this is more about the actual design.
Lest this critique be thought too negative, it's worth saying that there is a lot to like in your approach - your assessment of the likely requirements is thorough, and great to see inclusion of security and also operational monitoring and sla's considered up front.
However, reviewing this, I'd wonder what kind of job it actually was - it looks more like the application for a cloud architect role, rather than a software architect role, for which I'd expect to see more discussion of packages, modules, assemblies, libraries and software components.
All of the above notwithstanding, it's also worth considering - what is an interviewer looking for if they ask this in an interview? Nobody expects you to propose an architecture in 15 minutes that can do what has taken a team of Amazon engineers and architects many years to build and refine! They are looking for clarity of thought and expression, thoroughness of examination, logical conclusions from clearly stated assumptions, and knowledge and awareness of industry standards and practices.
Hope this is helpful, and best of luck on the job hunt!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With