In my application, users can create custom tables with three column types, Text, Numeric and Date. They can have up to 20 columns. I create a SQL table based on their schema using nvarchar(430)
for text, decimal(38,6)
for numeric and datetime
, along with an Identity Id column.
There is the potential for many of these tables to be created by different users, and the data might be updated frequently by users uploading new CSV files. To get the best performance during the upload of the user data, we truncate the table to get rid of existing data, and then do batches of BULK INSERT.
The user can make a selection based on a filter they build up, which can include any number of columns. My issue is that some tables with a lot of rows will have poor performance during this selection. To combat this I thought about adding indexes, but as we don't know what columns will be included in the WHERE condition we would have to index every column.
For example, on a local SQL server one table with just over a million rows and a WHERE condition on 6 of its columns will take around 8 seconds the first time it runs, then under one second for subsequent runs. With indexes on every column it will run in under one second the first time the query is ran. This performance issue is amplified when we test on an SQL Azure database, where the same query will take over a minute the first time its run, and does not improve on subsequent runs, but with the indexes it takes 1 second.
So, would it be a suitable solution to add a index on every column when a user creates a column, or is there a better solution?
Yes, it's a good idea given your model. There will, of course, be more overhead maintaining the indexes on the insert, but if there is no predictable standard set of columns in the queries, you don't have a lot of choices.
Suppose by 'updated frequently,' you mean data is added frequently via uploads rather than existing records being modified. In that case, you might consider one of the various non-SQL databases (like Apache Lucene or variants) which allow efficient querying on any combination of data. For reading massive 'flat' data sets, they are astonishingly fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With