<p>In my phone interview at one of the financial firms as an software architect, "<strong>design a cloud storage system like AWS S3".</strong> </p> <h3>Here is what I answered, Would you please help with your critiques & comments and on my approach. I would like to improve based on your feedback.</h3> <h3>First</h3> <p>, I listed requirements - CRUD Microservices on objects - Caching layer to improve performance - Deployment on PaaS - resiliency with failover - AAA support ( authorization, auditing, accounting/billing) - Administration microservices (user, project, lifecycle of object, SLA dashboard) - Metrics collection (Ops, Dev) - Security for service endpoints for admin UI </p> <h3>Second,</h3> <p>I defined basic APIs.<br>https://api.service.com/services/get Arugments object id, metadata return binary object https://api.service.com/services/upload Arguments object returns object id https://api.service.com/services/delete Arugments object id returns success/error http://api.service.com/service/update-meta Arugments object id, metadata return success/error</p> <h3>Third,</h3> <p>I drew the picture on board with architecture and some COTS components i can use. below is the picture. <img src="https://i.stack.imgur.com/jg09S.png" alt="enter image description here"></p> <p>Interviewer did not ask me much questions, and hence I am bit worried that if I am on right track with my process. Pl provide your feedback..</p> <p>Thanks in advance..</p>

<p>There are a couple of areas of feedback that might be helpful:</p> <h3>1. Comparison with S3's API</h3> <p>The S3 API is a RESTful API these days (it used to support SOAP) and it represents each 'file' (really a blob of data indexed by a key) as an HTTP resource, where the key is the path in the resource's URI. Your API is more RPC, in that each HTTP resource represents an operation to be carried out and the key to the blob is one of the parameters.</p> <p>Whether or not this is a good or bad thing depends on what you're trying to achieve and what architectural style you want to adopt (although I am a fan of REST, it doesn't mean you have to adopt it for all applications), however since you were asked to design a system like S3, your answer would have benefited from a clear argument as to why you chose NOT to use REST as S3 does.</p> <h3>2. Lines connecting things</h3> <p>Architecture diagrams tend to often be very high level - which is appropriate - but there is a tendency sometimes to just draw lines between boxes without being clear about what those lines mean. Does it mean there is a network connection between the infrastructure hosting those software components? Does it mean there is an information or data flow between those components? </p> <p>When you a draw a line like in your diagram that has multiple boxes all joining together on the line, the implication is that there is some relationship between the boxes. When you add arrows, there is the further implication that the relationship follows the direction of the arrows. But there is no clarity about what that relationship is, or why the directionality is important.</p> <p>One could infer from your diagram that the Memcache Cluster and the File Storage cluster are both sending data to the Metrics/SLA portal, but that they are not sending data to each other. Or that the ELB is not connected to the microservices. Clearly that is not the case.</p> <h3>3. Mixing Physical, Logical, Network & Software Architecture</h3> <ul> <li>General Type of Architecture <ul> <li> <strong>Logical Architecture</strong> - tends to be more focussed on information flows between areas of functional responsibility</li> <li> <strong>Physical Architecture</strong> - tends to be more focussed on deployable components, such as servers, VMs, containers, but I also group installable software packages here, as a running executable process may host multiple elements from the logical architecture</li> </ul> </li> <li>Specific Types of Architecture <ul> <li> <strong>Network Architecture</strong> - focuses on network connectivity between machines and devices - may reference VLANs, IP ranges, switches, routers etc.</li> <li> <strong>Software Architecture</strong> - focuses on the internal structures of a software program design - may talk about classes, modules, packages etc.</li> </ul> </li> </ul> <p>Your diagram includes a Load Balancer (more physical) and also a separate box per microservice (could be physical or logical or software), where each microservice is responsible for a different type of operation. It is not clear if each microservice has it's own load balancer, or if the load balancer is a layer 7 balancer that can map paths to different front ends.</p> <h3>4. Missing Context</h3> <p>While architectures often focus on the internal structure of a system, it is also important to consider the system context - i.e. what are the important elements outside the system that the system needs to interract with? e.g. what are the expected clients and their methods of connectivity?</p> <h3>5. Actual Architectural Design</h3> <p>While the above feedback focussed on the method of communicating your, this is more about the actual design.</p> <ul> <li>COTS products - did you talk about alternatives and why you selected the one you chose? Or is it just the only one you know. Awareness of the options and ability to select the appropriate option for a given purpose is valuable.</li> <li>Caching - you have caching in front of the file storage, but nothing in front of the microservices (edge cache, or front end reverse proxy) - assuming the microservices are adding some value to the process, caching their results might also be useful</li> <li>Redundancy and durability of data - while you talk about resiliency to failover, data redundancy and durability of the data storage is a key requirement in something like this and some explicit reference to how that would be achieved would be useful. Note this is slightly different to availability of services.</li> <li>Performance - you talk about introducing a caching layer to improve performance, but don't qualify the actual performance requirements - 100's of objects stored or retrieved per second, 1000's or millions? You need to know that to know what to build in</li> <li>Global Access - S3 is a multi-region/multi-datacentre solution - your architecture does not reference any aspect of multi-datacentre such as replication of the stored objects and metadata</li> <li>Security - you reference requirements around AAA but your proposed solution doesn't define which component is responsible for security, and at which layer or at what point in the request path a request is verified and accepted or rejected</li> </ul> <h3>6. The Good</h3> <p>Lest this critique be thought too negative, it's worth saying that there is a lot to like in your approach - your assessment of the likely requirements is thorough, and great to see inclusion of security and also operational monitoring and sla's considered up front.</p> <p>However, reviewing this, I'd wonder what kind of job it actually was - it looks more like the application for a cloud architect role, rather than a software architect role, for which I'd expect to see more discussion of packages, modules, assemblies, libraries and software components.</p> <p>All of the above notwithstanding, it's also worth considering - what is an interviewer looking for if they ask this in an interview? Nobody expects you to propose an architecture in 15 minutes that can do what has taken a team of Amazon engineers and architects many years to build and refine! They are looking for clarity of thought and expression, thoroughness of examination, logical conclusions from clearly stated assumptions, and knowledge and awareness of industry standards and practices.</p> <p>Hope this is helpful, and best of luck on the job hunt!</p>

Interview assignment - design a system like S3

Here is what I answered, Would you please help with your critiques & comments and on my approach. I would like to improve based on your feedback.

First

, I listed requirements - CRUD Microservices on objects - Caching layer to improve performance - Deployment on PaaS - resiliency with failover - AAA support ( authorization, auditing, accounting/billing) - Administration microservices (user, project, lifecycle of object, SLA dashboard) - Metrics collection (Ops, Dev) - Security for service endpoints for admin UI

Second,

I defined basic APIs.
https://api.service.com/services/get Arugments object id, metadata return binary object https://api.service.com/services/upload Arguments object returns object id https://api.service.com/services/delete Arugments object id returns success/error http://api.service.com/service/update-meta Arugments object id, metadata return success/error

Third,

I drew the picture on board with architecture and some COTS components i can use. below is the picture. enter image description here

Interviewer did not ask me much questions, and hence I am bit worried that if I am on right track with my process. Pl provide your feedback..

Thanks in advance..

246

asked Aug 19 '16 15:08

nuvatech

1 Answers

There are a couple of areas of feedback that might be helpful:

1. Comparison with S3's API

The S3 API is a RESTful API these days (it used to support SOAP) and it represents each 'file' (really a blob of data indexed by a key) as an HTTP resource, where the key is the path in the resource's URI. Your API is more RPC, in that each HTTP resource represents an operation to be carried out and the key to the blob is one of the parameters.

Whether or not this is a good or bad thing depends on what you're trying to achieve and what architectural style you want to adopt (although I am a fan of REST, it doesn't mean you have to adopt it for all applications), however since you were asked to design a system like S3, your answer would have benefited from a clear argument as to why you chose NOT to use REST as S3 does.

2. Lines connecting things

Architecture diagrams tend to often be very high level - which is appropriate - but there is a tendency sometimes to just draw lines between boxes without being clear about what those lines mean. Does it mean there is a network connection between the infrastructure hosting those software components? Does it mean there is an information or data flow between those components?

When you a draw a line like in your diagram that has multiple boxes all joining together on the line, the implication is that there is some relationship between the boxes. When you add arrows, there is the further implication that the relationship follows the direction of the arrows. But there is no clarity about what that relationship is, or why the directionality is important.

One could infer from your diagram that the Memcache Cluster and the File Storage cluster are both sending data to the Metrics/SLA portal, but that they are not sending data to each other. Or that the ELB is not connected to the microservices. Clearly that is not the case.

3. Mixing Physical, Logical, Network & Software Architecture

General Type of Architecture
- Logical Architecture - tends to be more focussed on information flows between areas of functional responsibility
- Physical Architecture - tends to be more focussed on deployable components, such as servers, VMs, containers, but I also group installable software packages here, as a running executable process may host multiple elements from the logical architecture
Specific Types of Architecture
- Network Architecture - focuses on network connectivity between machines and devices - may reference VLANs, IP ranges, switches, routers etc.
- Software Architecture - focuses on the internal structures of a software program design - may talk about classes, modules, packages etc.

Your diagram includes a Load Balancer (more physical) and also a separate box per microservice (could be physical or logical or software), where each microservice is responsible for a different type of operation. It is not clear if each microservice has it's own load balancer, or if the load balancer is a layer 7 balancer that can map paths to different front ends.

4. Missing Context

While architectures often focus on the internal structure of a system, it is also important to consider the system context - i.e. what are the important elements outside the system that the system needs to interract with? e.g. what are the expected clients and their methods of connectivity?

5. Actual Architectural Design

While the above feedback focussed on the method of communicating your, this is more about the actual design.

COTS products - did you talk about alternatives and why you selected the one you chose? Or is it just the only one you know. Awareness of the options and ability to select the appropriate option for a given purpose is valuable.
Caching - you have caching in front of the file storage, but nothing in front of the microservices (edge cache, or front end reverse proxy) - assuming the microservices are adding some value to the process, caching their results might also be useful
Redundancy and durability of data - while you talk about resiliency to failover, data redundancy and durability of the data storage is a key requirement in something like this and some explicit reference to how that would be achieved would be useful. Note this is slightly different to availability of services.
Performance - you talk about introducing a caching layer to improve performance, but don't qualify the actual performance requirements - 100's of objects stored or retrieved per second, 1000's or millions? You need to know that to know what to build in
Global Access - S3 is a multi-region/multi-datacentre solution - your architecture does not reference any aspect of multi-datacentre such as replication of the stored objects and metadata
Security - you reference requirements around AAA but your proposed solution doesn't define which component is responsible for security, and at which layer or at what point in the request path a request is verified and accepted or rejected

6. The Good

Lest this critique be thought too negative, it's worth saying that there is a lot to like in your approach - your assessment of the likely requirements is thorough, and great to see inclusion of security and also operational monitoring and sla's considered up front.

However, reviewing this, I'd wonder what kind of job it actually was - it looks more like the application for a cloud architect role, rather than a software architect role, for which I'd expect to see more discussion of packages, modules, assemblies, libraries and software components.

All of the above notwithstanding, it's also worth considering - what is an interviewer looking for if they ask this in an interview? Nobody expects you to propose an architecture in 15 minutes that can do what has taken a team of Amazon engineers and architects many years to build and refine! They are looking for clarity of thought and expression, thoroughness of examination, logical conclusions from clearly stated assumptions, and knowledge and awareness of industry standards and practices.

Hope this is helpful, and best of luck on the job hunt!

answered Sep 29 '22 10:09

Chris Simon

Related questions
                            
                                How do I verify Apache caching is working?
                            
                                Lazy<T> without exception caching
                            
                                How to use SDWebImage without any cache for one instance
                            
                                Is it safe to delete IntelliJ's system directory?
                            
                                What's the advantage of Read-through, write-behind over cache-aside pattern in AppFabric?
                            
                                How to measure Android app data size and identify storage leaks?
                            
                                How to disable caching for all WebApi responses in order to avoid IE using (from cache) responses
                            
                                Best way to combine fragment and object caching for memcached and Rails
                            
                                How to disable Apache caching in Apache-XAMPP?
                            
                                What is the "Endurance Cache" feature in my WordPress website?
                            
                                Python pandas persistent cache
                            
                                How to cache static files in ASP.NET Core?
                            
                                React app has to clear browser cache after new deployment
                            
                                Caching variables in the $_SESSION variable?
                            
                                How to cache REST API response in java
                            
                                Symfony 3.4 http cache , always Cache-Control: max-age=0, must-revalidate, private
                            
                                caching images served by servlet
                            
                                Invalidate a specific model in the Rails cache
                            
                                NUMA aware cache aligned memory allocation
                            
                                Python Bottle and Cache-Control

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Interview assignment - design a system like S3

Tags:

architecture

caching

amazon-s3

restful-architecture