Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java based high volume transaction web application

I have next to no experience dealing with high volume transactional websites and recently came across this interesting question. I am interested in knowing where the bottlenecks in a Java web application would occur under high load (thousands of requests per second). If someone could give me a high level approach to thinking about the following question, that would be great!

The only thing I've come up with is to use memcached to cache the database look-ups but I don't know how to calculate the amount of time each request will take and therefore how many requests per second the system might be able to handle.

Question: Internet-scale applications must be designed to process high volumes of transactions. Describe a design for a system that must process on average 30,000 HTTP requests per second. For each request, the system must perform a look-up into a dictionary of 50 million words, using a key word passed in via the URL query string. Each response will consist of a string containing the definition of the word (100 bytes or less).

Describe the major components of the system, and note which components should be custom-built and which components could leverage third-party applications. Include hardware estimates for each component. Please note that the design should include maximum performance at minimum hardware / software-licensing costs.

Document the rationale in coming up with the estimates.

Describe how the design would change if the definitions are 10 kilobytes each.

like image 717
JMM Avatar asked Jun 20 '10 07:06

JMM


People also ask

Can Java be used for Web applications?

Java is perfect for developing large web applications too because of its ability to communicate with a large number of systems. Services like peer-web services, database connectivity, and back-end services can also be accessed via Java web development. There are many platforms in Java for web development.

Which technology in Java is used to create a web application?

JavaServer Pages Technology. JavaServer Pages (JSP) technology provides a simplified, fast way to create dynamic web content. JSP technology enables rapid development of web-based applications that are server- and platform-independent.

What is transactional Web applications?

Definition. A Web Transaction is a transactional interaction between a client, usually a web browser, and one or several databases as backend of a multi-tier architecture. The middle tier of the architecture includes a web server which accepts client requests via HTTP.

How Java can be used in advance technologies?

It is a part of Java programming language. It is an advanced technology or advance version of Java specially designed to develop web-based, network-centric or enterprise applications. It includes the concepts like Servlet, JSP, JDBC, RMI, Socket programming, etc.


1 Answers

As background you may note bechmarks such as specmarks. Compared with your scenario there is significantly more processing, but you will see that your 30,000 req/sec is a comparatively high, but not insanely high, figure.

You may also find Joines et al useful. (Disclaimer: they're colleagues.)

In your scenario I would expect in descending order of cost:

  1. Database retrieval
  2. Network activity reading and returning requests
  3. Simple processing

You're not doing complex processing (Eg. graphic rendering or rocket-science type math). So first guess: if your dictionary were a database then then the cost of doing a query is going to dominate everything else. Traditionally, when we hit bottlenecks in the Web/App server tier we scale by adding more instances, but if the database is the bottleneck that's more of a problem. So one direction: what performance can you expect from a database engine does 30k tps seem feasible?

Your first observation: cache stuff is a commonly used stategy. Here you have (presumably) random hits across the whole dictionary, hence caching recent asnwers in itself is probably not going to help, unless ... can you cache the whole thing?

50,000,000 * (100 + overhead) == ??

On a 64bit JVM on a 64bit OS maybe it fits?

If not (and as the data gets really big, then probably not) then we need to scale. Hence a strategy of slicing the cache may be used. Have (for example) 4 servers, serving A-F, G-M, N-P, T-Z respectively (and, note, 4 separate caches or 4 separate databases). Have a dispatcher directing the requests.

like image 188
djna Avatar answered Sep 20 '22 13:09

djna