Java based high volume transaction web application

Tags:

I have next to no experience dealing with high volume transactional websites and recently came across this interesting question. I am interested in knowing where the bottlenecks in a Java web application would occur under high load (thousands of requests per second). If someone could give me a high level approach to thinking about the following question, that would be great!

The only thing I've come up with is to use memcached to cache the database look-ups but I don't know how to calculate the amount of time each request will take and therefore how many requests per second the system might be able to handle.

Question: Internet-scale applications must be designed to process high volumes of transactions. Describe a design for a system that must process on average 30,000 HTTP requests per second. For each request, the system must perform a look-up into a dictionary of 50 million words, using a key word passed in via the URL query string. Each response will consist of a string containing the definition of the word (100 bytes or less).

Describe the major components of the system, and note which components should be custom-built and which components could leverage third-party applications. Include hardware estimates for each component. Please note that the design should include maximum performance at minimum hardware / software-licensing costs.

Document the rationale in coming up with the estimates.

Describe how the design would change if the definitions are 10 kilobytes each.

717

asked Jun 20 '10 07:06

JMM

1 Answers

As background you may note bechmarks such as specmarks. Compared with your scenario there is significantly more processing, but you will see that your 30,000 req/sec is a comparatively high, but not insanely high, figure.

You may also find Joines et al useful. (Disclaimer: they're colleagues.)

In your scenario I would expect in descending order of cost:

Database retrieval
Network activity reading and returning requests
Simple processing

You're not doing complex processing (Eg. graphic rendering or rocket-science type math). So first guess: if your dictionary were a database then then the cost of doing a query is going to dominate everything else. Traditionally, when we hit bottlenecks in the Web/App server tier we scale by adding more instances, but if the database is the bottleneck that's more of a problem. So one direction: what performance can you expect from a database engine does 30k tps seem feasible?

Your first observation: cache stuff is a commonly used stategy. Here you have (presumably) random hits across the whole dictionary, hence caching recent asnwers in itself is probably not going to help, unless ... can you cache the whole thing?

50,000,000 * (100 + overhead) == ??

On a 64bit JVM on a 64bit OS maybe it fits?

If not (and as the data gets really big, then probably not) then we need to scale. Hence a strategy of slicing the cache may be used. Have (for example) 4 servers, serving A-F, G-M, N-P, T-Z respectively (and, note, 4 separate caches or 4 separate databases). Have a dispatcher directing the requests.

188

answered Sep 20 '22 13:09

djna

Related questions
                            
                                To find an execution line-counter for Java IDE
                            
                                What is the general approach to implement the serial number and activation protection for Java Application? [closed]
                            
                                Java Multiple ResourceBundles
                            
                                Tomcat on Windows x64 using 32-bit JDK
                            
                                Expose webservice directly to webclients or keep a thin server-side script layer in between?
                            
                                How To Export to JNLP in Eclipse
                            
                                Java: Checking if PC is idle
                            
                                Display Outlook rooms occupancy in a web page
                            
                                Where does Tomcat append / to directory paths?
                            
                                How to use SQLiteOpenHelper without or less restrictive use of Context?
                            
                                Real time object tracking in java(some java API) or C#(emgucv,dshownet,Aforge.NET)
                            
                                Standard Workflow when working with JPA
                            
                                eclEmma - full code coverage on class header?
                            
                                How to convert/translate information?
                            
                                0.20.2 API hadoop version with java 5
                            
                                Is there Commons AnnotationUtils like library? (Java)
                            
                                Java Swing: How to make the JComboxBox drop down list taller?
                            
                                How to explicitly terminate http connection from server with no response header
                            
                                Passing Args to Clojure from Java
                            
                                How to find out the drag source component in the drop target TransferHandler?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java based high volume transaction web application

Tags:

java

performance

requests-per-second

JMM

People also ask

1 Answers

djna

Recent Activity

Donate For Us