Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a reason for a Primary Key column that would never be used?

I have a routine that will be creating individual tables (Sql Server 2008) to store the results of reports generated by my application (Asp.net 3.5). Each report will need its own table, as the columns for the table would vary based on the report settings. A table will contain somewhere between 10-5,000 rows, rarely more than 10,000.

The following usage rules will apply:

  • Once stored, the data will never be updated.
  • Whenever results for the table are accessed, all data will be retrieved.
  • No other table will need to perform a join with this table.

Knowing this, is there any reason to create a PK index column on the table? Will doing so aid the performance of retrieving the data in any way, and if it would, would this outweigh the extra load of updating the index when inserting data (I know that 10K records is a relatively small amount, but this solution needs to be able to scale).

Update: Here are some more details on the data being processed, which goes into the current design decision of one table per report:

  • Tables will record a set of numeric values (set at runtime based on the report settings) that correspond to a different set of reference varchar values (also set at runtime based on the report settings).
  • Whenever data is retrieved, it some post-processing on the server will be required before the output can be displayed to the user (thus I will always be retrieving all values).

I would also be suspicious of someone claiming that they had to create a new table for each time the report was run. However, given that different columns (both in number, name and datatype) could conceivably be needed for every time the report was run, I don't see a great alternative.

The only other thing I can think of is to have an ID column (identifying the ReportVersionID, corresponding to another table), ReferenceValues column (varchar field, containing all Reference values, in a specified order, separated by some delimiter) and NumericValues column (same as ReferenceValues, but for the numbers), and then when I retrieve the results, put everything into specialized objects in the system, separating the values based on the defined delimiter). Does this seem preferable?

like image 456
Yaakov Ellis Avatar asked Oct 18 '10 13:10

Yaakov Ellis


People also ask

When would you not use a primary key?

You can make an arguement to never use a primary key, if performance, data integrity, and normalization are not required.

Is primary key always necessary for every table?

Every table can have (but does not have to have) a primary key. The column or columns defined as the primary key ensure uniqueness in the table; no two rows can have the same key. The primary key of one table may also help to identify records in other tables, and be part of the second table's primary key.

What is a meaningless primary key?

A meaningless primary key is a primary key that has no meaning to the business. The terms meaningful and meaningless are used, rather than intelligent and non-intelligent, because primary keys cannot possess intelligence.

What are the main restrictions by primary key if applied on any column?

Primary keys must contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields).


3 Answers

Primary keys are not a MUST for any and all data tables. True, they are usually quite useful and to abandon them is unwise. However, in addition to a primary missions of speed (which I agree would doubtfully be positively affected) is also that of uniqueness. To that end, and valuing the consideration you've already obviously taken, I would suggest that the only need for a primary key would be to govern the expected uniqueness of the table.

Update: You mentioned in a comment that if you did a PK that it would include an Identity column that presently does not exist and is not needed. In this case, I would advise against the PK altogether. As @RedFilter pointed out, surrogate keys never add any value.

like image 200
Brad Avatar answered Oct 18 '22 18:10

Brad


I would keep it simple, just store the report results converted to json or xml, in a VARCHAR(MAX) column

like image 30
A-K Avatar answered Oct 18 '22 17:10

A-K


One of the most useful and least emphasized (explicitly) benefits of data integrity (primary keys and foreign key references to start with) is that it forces a 'design by contract' between your data and your application(s); which stops quite a lot of types of bugs from doing any damage to your data. This is such a huge win and a thing that is implicitly taken for granted (it is not 'the database' that protects it, but the integrity rules you specify; forsaking the rules you expose your data to various levels of degradation).

This seems unimportant to you (from the fact that you did not even discuss what would be a possible primary key) and your data seems quite unrelated to other parts of the system (from the fact that you will not do joins to any other tables); but still - if all things are equal I would model the data properly and then if primary keys (or other data integrity rules) are not used and if chasing every last bit of performance I would consider dropping them in production (and test for any actual gains).

As for comments that creating tables is a performance hit - that is true, but you did not tell us how temporary are these tables? Once created will they be heavily used before scrapped? Or do you plan to create tables for just dozen of read operations.

In case you will heavily use these tables and if you will provide clean mechanism for managing them (removing them when not used, selecting them, etc...) I think that dynamically creating the tables would be perfectly fine (you could have shared more details on the tables themselves; use case would be nice)

Notes on other solutions:

EAV model

is horrible unless very specific conditions are met (for example: flexibility is paramount and automating DDL is too much of a hassle). Keep away from it (or be very, very good at anticipating what kinds of queries will you have to deal with and rigorous in validating data on the front end).

XML/BLOB approach

might be the right thing for you if you will consume the data as XML/BLOBs at presentation layer (always read all of the rows, always write the whole 'object' and finally, if your presentation layer likes XML/BLOBS)

EDIT: Also, depending on the usage patterns, having primary key can indeed increase the speed of retrieval, and if I can read the fact that the data will not be updated as 'it will be written once and read many times' then there is a good chance that it will indeed overweight the cost of updating the index on inserts.

like image 1
Unreason Avatar answered Oct 18 '22 19:10

Unreason