Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to learn about designing highly transactional systems?

I have been mostly working on data analysis, BI tools, etc. in my career. Most of the applications I work on are majorly read-only applications. Although I have also worked on simple CRUD applications but nothing extraordinarily transactional. As a software engineer, I feel there is a void in my learning if I don't know how to design highly transactional systems and databases, like how Amazon, airline systems, etc. work. I would like to ask the community here to suggest some resources, books or simple projects on this subject. Something that can take hands-on approach whilst teaching about the necessary theory. I know this is a subjective matter but I can mark the most useful answer green. Looking forward to your suggestions and thanking in anticipation.

like image 725
Kumar Vaibhav Avatar asked Dec 29 '15 12:12

Kumar Vaibhav


People also ask

What is System Design transaction?

System Design (57 Part Series) A distributed transaction is a set of operations on data that is performed across two or more databases. It is typically coordinated across separate nodes connected by a network, but may also span multiple databases on a single server.

What is transactional ERP?

Transactional systems are databases that record a company's daily transactions. The three major transactional databases include CRM (customer relationship management), HRM (human resources management), and ERP (enterprise resource planning).

Which type of database would be appropriate for a high transaction database?

OLTP databases are designed to run very quick transactional queries and they do it quite well.

What is the difference between transactional and relational databases?

Transactional databases excel at storing and querying the data required to power an application. They ensure data integrity and consistency. Most relational databases support multi-record transactions. With proper data modeling in non-relational databases, multi-record transactions are not always essential.


1 Answers

I am going to organize the answer in four broad categories, namely

  1. theoretical and academic background,
  2. popular sources,
  3. software and tools, and
  4. exercises.

Books and Papers

This are the foundations of the field - how to go from 0 to quite a decent, expertly level, but mostly theoretically.

Intro Level

Transaction Processing: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems) by Jim Gray

Silberschatz book (Database System Concepts) in the later chapters covers internal workings of the advanced transactional systems, has some resources, etc.

Database Specific

H-store paper - describes benefits of in-memory design for high transactional loads. H-store work has inspired development of VoltDB.

Calvin paper - Fast Distributed Transactions for Partitioned Database Systems. Gives a very good background, related work, and an insight into the state of the art.

Architecture of a Database System by Hellerstein, Stonebraker and Hamilton covers many aspects.

Limitations and Boundaries

Great paper on virtues and limitations of highly available transactions.

CAP Theorem paper - On the design tradeoffs of consistency, availability and partitioning for large scale systems. Very important.

Parallel Processing and Parallel Databases

Popular and Current Sources

Blogs

High Scalability is a perfect blog for what you are looking for. Here is, for example, a great entry on the evolution of Amazon's architecture.Very close to what you've been looking for.

Facebook, LinkedIn, and Twitter engineering blogs are great resources. I would also check Google Research site and their Google+. Netflix is not bad either.

Conferences

VLDB and SIGMOD conferences (inluding SIGMOD blog) are where most of the most advanced data systems are presented by the researchers/academia, and the corporations.

HPTS is an interesting conference/workshop with the good agenda and publications.

I would even check USENIX series for cutting edge, systems stuff.

Case Study Architectures

VoltDB is an ultra-transaction, in-memory database, designed by Mike Stonebraker, ACM fellow, and "father" of the most modern database concepts.

IBM mainframe still has a very prominent place in the world of high volume transactional processing. At the time of the writing of this answer, they are touting their Z13 system for extreme, encrypted transaction processing volumes.

If you are interested in doing transactions the "Big Data" style, there are lots of choices, but HBase is probably the most interesting one. Here are some suggested reading sources for HBase: Yahoo's Omid built on HBase Transactions over HBase

Another interesting architecture is Twitter, now Apache Storm. and Apache Kafka for streaming and real-time processing.

Benchmarks and Exercises

If you want to try few things out, look at the TPC family of benchmarks. There are transactional, ETL, BI, and decision support/mixed load analytic benchmarks. These are relational-oriented.

You can take these benchmarks and practice them against the open source SMP (e.g. postgres, MySQL) and MPP databases such as Greenplum (a link to a great and comprehensive documentation on querying, performance, some sample setups, and how MPP databases process queries).

I recommended these practical scenarios and architectures for HBase-oriented transactional systems.

For state-of-the-art message and actor oriented transactional systems, you will probably need to buy a book or two. For Akka (which serves as an internal to Spark) you could probably use Akka in Action and go through the exercises at the end of each chapter. There are also some exercises from the training sessions here.

For stream processing, here are some good exercises with Apache Kafka (parts 1 and http://www.confluent.io/blog/stream-data-platform-2/). Cloudera has a good "getting started" guide.

To practice message-oriented state-of-the-art systems, I would suggest Getting Started with Storm, and perhaps going through these exercises. There are number of real-topologies featured.

For good, old JMS, you can use this online reference for practice, or go more sophisticated with these Active MQ exercises.

If you want to torture yourself with the mainframe, try this emulator. It emulates IBM's OS/370-390.

like image 156
Edmon Avatar answered Nov 30 '22 23:11

Edmon