Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interview assignment - design a system like S3

In my phone interview at one of the financial firms as an software architect, "design a cloud storage system like AWS S3".

Here is what I answered, Would you please help with your critiques & comments and on my approach. I would like to improve based on your feedback.

First

, I listed requirements - CRUD Microservices on objects - Caching layer to improve performance - Deployment on PaaS - resiliency with failover - AAA support ( authorization, auditing, accounting/billing) - Administration microservices (user, project, lifecycle of object, SLA dashboard) - Metrics collection (Ops, Dev) - Security for service endpoints for admin UI

Second,

I defined basic APIs.
https://api.service.com/services/get Arugments object id, metadata return binary object https://api.service.com/services/upload Arguments object returns object id https://api.service.com/services/delete Arugments object id returns success/error http://api.service.com/service/update-meta Arugments object id, metadata return success/error

Third,

I drew the picture on board with architecture and some COTS components i can use. below is the picture. enter image description here

Interviewer did not ask me much questions, and hence I am bit worried that if I am on right track with my process. Pl provide your feedback..

Thanks in advance..

like image 246
nuvatech Avatar asked Aug 19 '16 15:08

nuvatech


People also ask

What is S3 in system design?

S3 is a distributed file storage facilitating storage of blob data. In simpler terms, it could be described as the folder on the cloud.

How do you explain S3?

Amazon S3 is an object storage service that stores data as objects within buckets. An object is a file and any metadata that describes the file. A bucket is a container for objects. To store your data in Amazon S3, you first create a bucket and specify a bucket name and AWS Region.

What are different types of Amazon S3 operations?

With Amazon S3, you can store data across a range of different S3 storage classes purpose-built for specific use cases and access patterns: S3 Intelligent-Tiering, S3 Standard, S3 Standard-Infrequent Access (S3 Standard-IA), S3 One Zone-Infrequent Access (S3 One Zone-IA), S3 Glacier Instant Retrieval, S3 Glacier ...


1 Answers

There are a couple of areas of feedback that might be helpful:

1. Comparison with S3's API

The S3 API is a RESTful API these days (it used to support SOAP) and it represents each 'file' (really a blob of data indexed by a key) as an HTTP resource, where the key is the path in the resource's URI. Your API is more RPC, in that each HTTP resource represents an operation to be carried out and the key to the blob is one of the parameters.

Whether or not this is a good or bad thing depends on what you're trying to achieve and what architectural style you want to adopt (although I am a fan of REST, it doesn't mean you have to adopt it for all applications), however since you were asked to design a system like S3, your answer would have benefited from a clear argument as to why you chose NOT to use REST as S3 does.

2. Lines connecting things

Architecture diagrams tend to often be very high level - which is appropriate - but there is a tendency sometimes to just draw lines between boxes without being clear about what those lines mean. Does it mean there is a network connection between the infrastructure hosting those software components? Does it mean there is an information or data flow between those components?

When you a draw a line like in your diagram that has multiple boxes all joining together on the line, the implication is that there is some relationship between the boxes. When you add arrows, there is the further implication that the relationship follows the direction of the arrows. But there is no clarity about what that relationship is, or why the directionality is important.

One could infer from your diagram that the Memcache Cluster and the File Storage cluster are both sending data to the Metrics/SLA portal, but that they are not sending data to each other. Or that the ELB is not connected to the microservices. Clearly that is not the case.

3. Mixing Physical, Logical, Network & Software Architecture

  • General Type of Architecture
    • Logical Architecture - tends to be more focussed on information flows between areas of functional responsibility
    • Physical Architecture - tends to be more focussed on deployable components, such as servers, VMs, containers, but I also group installable software packages here, as a running executable process may host multiple elements from the logical architecture
  • Specific Types of Architecture
    • Network Architecture - focuses on network connectivity between machines and devices - may reference VLANs, IP ranges, switches, routers etc.
    • Software Architecture - focuses on the internal structures of a software program design - may talk about classes, modules, packages etc.

Your diagram includes a Load Balancer (more physical) and also a separate box per microservice (could be physical or logical or software), where each microservice is responsible for a different type of operation. It is not clear if each microservice has it's own load balancer, or if the load balancer is a layer 7 balancer that can map paths to different front ends.

4. Missing Context

While architectures often focus on the internal structure of a system, it is also important to consider the system context - i.e. what are the important elements outside the system that the system needs to interract with? e.g. what are the expected clients and their methods of connectivity?

5. Actual Architectural Design

While the above feedback focussed on the method of communicating your, this is more about the actual design.

  • COTS products - did you talk about alternatives and why you selected the one you chose? Or is it just the only one you know. Awareness of the options and ability to select the appropriate option for a given purpose is valuable.
  • Caching - you have caching in front of the file storage, but nothing in front of the microservices (edge cache, or front end reverse proxy) - assuming the microservices are adding some value to the process, caching their results might also be useful
  • Redundancy and durability of data - while you talk about resiliency to failover, data redundancy and durability of the data storage is a key requirement in something like this and some explicit reference to how that would be achieved would be useful. Note this is slightly different to availability of services.
  • Performance - you talk about introducing a caching layer to improve performance, but don't qualify the actual performance requirements - 100's of objects stored or retrieved per second, 1000's or millions? You need to know that to know what to build in
  • Global Access - S3 is a multi-region/multi-datacentre solution - your architecture does not reference any aspect of multi-datacentre such as replication of the stored objects and metadata
  • Security - you reference requirements around AAA but your proposed solution doesn't define which component is responsible for security, and at which layer or at what point in the request path a request is verified and accepted or rejected

6. The Good

Lest this critique be thought too negative, it's worth saying that there is a lot to like in your approach - your assessment of the likely requirements is thorough, and great to see inclusion of security and also operational monitoring and sla's considered up front.

However, reviewing this, I'd wonder what kind of job it actually was - it looks more like the application for a cloud architect role, rather than a software architect role, for which I'd expect to see more discussion of packages, modules, assemblies, libraries and software components.

All of the above notwithstanding, it's also worth considering - what is an interviewer looking for if they ask this in an interview? Nobody expects you to propose an architecture in 15 minutes that can do what has taken a team of Amazon engineers and architects many years to build and refine! They are looking for clarity of thought and expression, thoroughness of examination, logical conclusions from clearly stated assumptions, and knowledge and awareness of industry standards and practices.

Hope this is helpful, and best of luck on the job hunt!

like image 71
Chris Simon Avatar answered Sep 29 '22 10:09

Chris Simon