What's your approach for optimizing large tables (+1M rows) on SQL Server?

Tags:

I'm importing Brazilian stock market data to a SQL Server database. Right now I have a table with price information from three kind of assets: stocks, options and forwards. I'm still in 2006 data and the table has over half million records. I have more 12 years of data to import so the table will exceed a million records for sure.

Now, my first approach for optimization was to keep the data to a minimum size, so I reduced the row size to an average of 60 bytes, with the following columns:

[Stock] [int] NOT NULL
[Date] [smalldatetime] NOT NULL
[Open] [smallmoney] NOT NULL
[High] [smallmoney] NOT NULL
[Low] [smallmoney] NOT NULL
[Close] [smallmoney] NOT NULL
[Trades] [int] NOT NULL
[Quantity] [bigint] NOT NULL
[Volume] [money] NOT NULL

Now, second approach for optimization was to make a clustered index. Actually the primary index is automatically clusted and I made it a compound index with Stock and Date fields. This is unique, I can't have two quote data for the same stock on the same day.

The clusted index makes sure that quotes from the same stock stay together, and probably ordered by date. Is this second information true?

Right now with a half million records it's taking around 200ms to select 700 quotes from a specific asset. I believe this number will get higher as the table grows.

Now for a third approach I'm thinking in maybe splitting the table in three tables, each for a specific market (stocks, options and forwards). This will probably cut the table size by 1/3. Now, will this approach help or it doesn't matter too much? Right now the table has 50mb of size so it can fit entirely in RAM without much trouble.

Another approach would be using the partition feature of SQL Server. I don't know much about it but I think it's normally used when the tables are large and you can span across multiple disks to reduce I/O latency, am I right? Would partitioning be any helpful in this case? I believe I can partition the newest values (latest years) and oldest values in different tables, The probability of seeking for newest data is higher, and with a small partition it will probably be faster, right?

What would be other good approachs to make this the fastest possible? The mainly select usage of the table will be for seeking a specific range of records from a specific asset, like the latest 3 months of asset X. There will be another usages but this will be the most common, being possible executed by more than 3k users concurrently.

710

asked Dec 03 '08 16:12

Edwin Jarvis

1 Answers

At 1 million records, I wouldn't consider this a particularly large table needing unusual optimization techniques such as splitting the table up, denormalizing, etc. But those decisions will come when you've tried all the normal means that don't affect your ability to use standard query techniques.

Now, second approach for optimization was to make a clustered index. Actually the primary index is automatically clusted and I made it a compound index with Stock and Date fields. This is unique, I can't have two quote data for the same stock on the same day.

The clusted index makes sure that quotes from the same stock stay together, and probably ordered by date. Is this second information true?

It's logically true - the clustered index defines the logical ordering of the records on the disk, which is all you should be concerned about. SQL Server may forego the overhead of sorting within a physical block, but it will still behave as if it did, so it's not significant. Querying for one stock will probably be 1 or 2 page reads in any case; and the optimizer doesn't benefit much from unordered data within a page read.

Right now with a half million records it's taking around 200ms to select 700 quotes from a specific asset. I believe this number will get higher as the table grows.

Not necessarily significantly. There isn't a linear relationship between table size and query speed. There are usually a lot more considerations that are more important. I wouldn't worry about it in the range you describe. Is that the reason you're concerned? 200 ms would seem to me to be great, enough to get you to the point where your tables are loaded and you can start doing realistic testing, and get a much better idea of real-life performance.

Now for a third approach I'm thinking in maybe splitting the table in three tables, each for a specific market (stocks, options and forwards). This will probably cut the table size by 1/3. Now, will this approach help or it doesn't matter too much? Right now the table has 50mb of size so it can fit entirely in RAM without much trouble.

No! This kind of optimization is so premature it's probably stillborn.

Another approach would be using the partition feature of SQL Server.

Same comment. You will be able to stick for a long time to strictly logical, fully normalized schema design.

What would be other good approachs to make this the fastest possible?

The best first step is clustering on stock. Insertion speed is of no consequence at all until you are looking at multiple records inserted per second - I don't see anything anywhere near that activity here. This should get you close to maximum efficiency because it will efficiently read every record associated with a stock, and that seems to be your most common index. Any further optimization needs to be accomplished based on testing.

163

answered Oct 20 '22 22:10

dkretz

Related questions
                            
                                Add WHERE clauses to SQL dynamically / programmatically
                            
                                Changing the collation of a SQL Server 2012 database
                            
                                LocalDB not recognized in Visual Studio 2015
                            
                                How to format SQLCMD output
                            
                                how to subtract two time in sql server?
                            
                                Error "collection was modified enumeration operation may not execute" when restoring database backup in Azure Data Studio
                            
                                How to disable "clr strict security" in SQL Server
                            
                                Are temp tables in SQL Server multiuser safe?
                            
                                No milliseconds value when reading DateTime values from a SQL database in C#
                            
                                IF/ELSE Stored Procedure
                            
                                There is already an object named '##Temp' in the database
                            
                                SQL Server 2008 R2 Add column into a specific location
                            
                                Will GETUTCDATE() return the same value if used twice in the same statement?
                            
                                Handshake exception occurring when connecting to SQL Server Azure with .NET 4.5
                            
                                Insert manually into a table by SQL statement, but key is autoincremented
                            
                                error occurred while establishing a connection to SQL Server
                            
                                Get column information for a User-Defined Table Type
                            
                                What is a Bw-tree?
                            
                                How to Create an Extension for SSMS 2019 (v18)
                            
                                What are the best ways to determine what port an application is using?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's your approach for optimizing large tables (+1M rows) on SQL Server?

Tags:

optimization

sql-server

bigtable

Edwin Jarvis

People also ask

1 Answers

dkretz

Recent Activity

Donate For Us