I would like to store stock trading data for 1000 symbols. The data is actually converted from text files so there is no need for inserts and updates; only read-only access will be required. The data is basically grouped like this: each symbol has many records: <code>{timestamp, price, quantity}</code>, each record represents a trade. An approximate upperbound of data for one symbol is 5 records/second, 8 hours for each working day, i.e. 5x60x60x8 = 144K per day. I.e. 1K symbols would generate 144M records per day. Most of operations over the data would be something like: <ul> <li>give me all records for a symbol for the period Date D1, Time T1 to Date D2, Time T2</li> <li>find an min/max/avg of price or quantity for the period [D1, T1...D2, T2]</li> </ul> Now the question: what would be the best design for a database in this case? <ul> <li>Can I store all trades for symbol in a single table? Tables would quickly grow too big in this case though.</li> <li>Shall I create a separate table per day/week/month? I.e. 2013-10-25_ABC (ABC - symbol name). In this case we may get 1K new tables per day/week/month.</li> <li>Or, may be plain text files would be enough in such case? E.g., having all symbols data as files under 2013-10-15 folder, resulting in 1K files in each folder</li> </ul> The database may be either MS SQL or MySQL. The total time period - up to 5 years. Thank you!

Since you'll be running queries from one datetime to another I wouldn't split tables up at all. Instead, learn more about sharding. Below is the schema I would use: <pre class="prettyprint"><code>symbols id varchar(6) // MSFT, GOOG, etc. name varchar(50) // Microsoft, Google, etc. ... trades id unsigned bigint(P) symbol_id varchar(6)(F symbols.id) qwhen datetime price double quantity double ... </code></pre>

Database design for large amounts of data

I would like to store stock trading data for 1000 symbols. The data is actually converted from text files so there is no need for inserts and updates; only read-only access will be required.

The data is basically grouped like this: each symbol has many records: {timestamp, price, quantity}, each record represents a trade.

An approximate upperbound of data for one symbol is 5 records/second, 8 hours for each working day, i.e. 5x60x60x8 = 144K per day. I.e. 1K symbols would generate 144M records per day.

Most of operations over the data would be something like:

give me all records for a symbol for the period Date D1, Time T1 to Date D2, Time T2
find an min/max/avg of price or quantity for the period [D1, T1...D2, T2]

Now the question: what would be the best design for a database in this case?

Can I store all trades for symbol in a single table? Tables would quickly grow too big in this case though.
Shall I create a separate table per day/week/month? I.e. 2013-10-25_ABC (ABC - symbol name). In this case we may get 1K new tables per day/week/month.
Or, may be plain text files would be enough in such case? E.g., having all symbols data as files under 2013-10-15 folder, resulting in 1K files in each folder

The database may be either MS SQL or MySQL. The total time period - up to 5 years. Thank you!

Which database is best for large amount of data?

8) Best Databases for 2021: Cassandra It offers high scalability which is very important for handling huge amounts of data in industries. On top of that, its Decentralized Database supports automatic Data Replication, as well as Multi-datacenter Replication. More information about Cassandra can be found here.

What is big data in database design?

Big data defined The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.

Can MySQL handle 1 billion records?

Yeah, it can handle billions of records. If you properly index tables, they fit in memory and your queries are written properly then it shouldn't be an issue.

That's a whole lot of data. Do look at NoSQl.

Using SQL, here are some basic ideas:

Put all price data in a table, using as small data types as possible. Use a SymbolId (int) to reference the symbol, the smallest datetime type needed, the smallest monetary type needed.

Do denormalize. Make a second table with min/max/avg per day and SymbolId.

Research horizontal partitioning and use indexes.

Third option is the best 1. You need high Read performance with almost negligible writes.

You requirements are best suited for NoSql databases. Single table with no relationships; MySQL would be overkill. More info --> NoSql Databases

Since you'll be running queries from one datetime to another I wouldn't split tables up at all. Instead, learn more about sharding. Below is the schema I would use:

symbols
    id          varchar(6) // MSFT, GOOG, etc.
    name        varchar(50) // Microsoft, Google, etc.
    ...

trades
    id              unsigned bigint(P)
    symbol_id       varchar(6)(F symbols.id)
    qwhen           datetime
    price           double
    quantity        double
    ...

Database design for large amounts of data

Tags:

database

database-design

stocks

Alex

People also ask

3 Answers

digscoop

Barun

Benny Hill

Recent Activity

Donate For Us

Database design for large amounts of data

Tags:

database

database-design

stocks

Alex

People also ask

3 Answers

digscoop

Barun

Benny Hill

Related questions

Recent Activity

Donate For Us