I would like to store stock trading data for 1000 symbols. The data is actually converted from text files so there is no need for inserts and updates; only read-only access will be required.
The data is basically grouped like this: each symbol has many records: {timestamp, price, quantity}
, each record represents a trade.
An approximate upperbound of data for one symbol is 5 records/second, 8 hours for each working day, i.e. 5x60x60x8 = 144K per day. I.e. 1K symbols would generate 144M records per day.
Most of operations over the data would be something like:
Now the question: what would be the best design for a database in this case?
The database may be either MS SQL or MySQL. The total time period - up to 5 years. Thank you!
8) Best Databases for 2021: Cassandra It offers high scalability which is very important for handling huge amounts of data in industries. On top of that, its Decentralized Database supports automatic Data Replication, as well as Multi-datacenter Replication. More information about Cassandra can be found here.
Big data defined The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.
Yeah, it can handle billions of records. If you properly index tables, they fit in memory and your queries are written properly then it shouldn't be an issue.
That's a whole lot of data. Do look at NoSQl.
Using SQL, here are some basic ideas:
Put all price data in a table, using as small data types as possible. Use a SymbolId (int) to reference the symbol, the smallest datetime type needed, the smallest monetary type needed.
Do denormalize. Make a second table with min/max/avg per day and SymbolId.
Research horizontal partitioning and use indexes.
Third option is the best 1. You need high Read performance with almost negligible writes.
You requirements are best suited for NoSql databases. Single table with no relationships; MySQL would be overkill. More info --> NoSql Databases
Since you'll be running queries from one datetime to another I wouldn't split tables up at all. Instead, learn more about sharding. Below is the schema I would use:
symbols
id varchar(6) // MSFT, GOOG, etc.
name varchar(50) // Microsoft, Google, etc.
...
trades
id unsigned bigint(P)
symbol_id varchar(6)(F symbols.id)
qwhen datetime
price double
quantity double
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With