Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Your first gut feeling on this SqlServer design question

We have 2 tables. One holds measurements, the other one holds timestamps (one for every minute) every measurement holds a FK to a timestamp. We have 8M (million) measurements, and 2M timestamps.

We are creating a report database via replication, and my first solution was this: when a new measurement was received via the replication process, lookup the right timestamp and add it to the measurement table. Yes, it's duplication of data, but it is for reporting and since we have measurements every 5 minutes and users can query for yearly data (105.000 measurements) we have to optimize for speed.

But a co-developer said: you don't have to do that, we'll just query with a join (on the two tables), SqlServer is so fast, you don't see the difference.

My first reaction was: a join on two tables with 8M and 2M records can't make 'no difference'.

What is your first feeling on this?

EDIT: new measurements: 400 records per 5 minutes

EDIT 2: maybe the question is not so clear:

the first solution is to get the data from the timestamp table and copy it to the measurement table when the measurement record is inserted. In that case we have an action when the record is inserted AND an extra (duplicated) timestamp value. In this case we lonly query ONE table because it holds all the data.

The second solution is to join the two tables in a query.

like image 664
Michel Avatar asked Jan 22 '23 06:01

Michel


1 Answers

With the proper index the join will make no difference*. My initial thought is that if the report is querying over the entire dataset, the joins might actually be faster because there is literally 6 million fewer timestamps that it has to read from the disk.

*This is just a guess based on my experience with tables with millions of records. You results will vary based on your queries.

like image 104
tster Avatar answered Feb 01 '23 00:02

tster