Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sane way to store different data types within same column in postgres?

I'm currently attempting to modify an existing API that interacts with a postgres database. Long story short, it's essentially stores descriptors/metadata to determine where an actual 'asset' (typically this is a file of some sort) is storing on the server's hard disk.

Currently, its possible to 'tag' these 'assets' with any number of undefined key-value pairs (i.e. uploadedBy, addedOn, assetType, etc.) These tags are stored in a separate table with a structure similar to the following:

+---------------+----------------+-------------+
|assetid (text) | tagid(integer) | value(text) |
|---------------+----------------+-------------|
|someStringValue| 1234           | someValue   |
|---------------+----------------+-------------|
|aDiffStringKey | 1235           | a username  |
|---------------+----------------+-------------|
|aDiffStrKey    | 1236           | Nov 5, 1605 |
+---------------+----------------+-------------+

assetid and tagid are foreign keys from other tables. Think of the assetid representing a file and the tagid/value pair is a map of descriptors.

Right now, the API (which is in Java) creates all these key-value pairs as a Map object. This includes things like timestamps/dates. What we'd like to do is to somehow be able to store different types of data for the value in the key-value pair. Or at least, storing it differently within the database, so that if we needed to, we could run queries checking date-ranges and the like on these tags. However, if they're stored as text items in the db, then we'd have to a.) Know that this is actually a date/time/timestamp item, and b.) convert into something that we could actually run such a query on.

There is only 1 idea I could think of thus far, without complete changing changing the layout of the db too much.

It is to expand the assettag table (shown above) to have additional columns for various types (numeric, text, timestamp), allow them to be null, and then on insert, checking the corresponding 'key' to figure out what type of data it really is. However, I can see a lot of problems with that sort of implementation.

Can any PostgreSQL-Ninjas out there offer a suggestion on how to approach this problem? I'm only recently getting thrown back into the deep-end of database interactions, so I admit I'm a bit rusty.

like image 770
K.Niemczyk Avatar asked Aug 14 '13 13:08

K.Niemczyk


People also ask

Can a SQL column have multiple data types?

So the answer is: no, a column in a relational database (that honors the SQL standard) can not have multiple data types.

What is character varying data type in PostgreSQL?

PostgreSQL supports a character data type called VARCHAR. This data type is used to store characters of limited length. It is represented as varchar(n) in PostgreSQL, where n represents the limit of the length of the characters. If n is not specified it defaults to varchar which has unlimited length.

Which class can be used to match columns with the same name but different data types?

In SQL:1999, the join of two tables returning only matched rows is called an join. A specifies that all left outer rows be returned. The clause can also be used to match columns that have the same name but different data types.


2 Answers

You've basically got two choices:

Option 1: A sparse table

Have one column for each data type, but only use the column that matches that data type you want to store. Of course this leads to most columns being null - a waste of space, but the purists like it because of the strong typing. It's a bit clunky having to check each column for null to figure out which datatype applies. Also, too bad if you actually want to store a null - then you must chose a specific value that "means null" - more clunkiness.

Option 2: Two columns - one for content, one for type

Everything can be expressed as text, so have a text column for the value, and another column (int or text) for the type, so your app code can restore the correct value in the correct type object. Good things are you don't have lots of nulls, but importantly you can easily extend the types to something beyond SQL data types to application classes by storing their value as json and their type as the class name.

I have used option 2 several times in my career and it was always very successful.

like image 139
Bohemian Avatar answered Oct 12 '22 03:10

Bohemian


Another option, depending on what your doing, could be to just have one value column but store some json around the value...

This could look something like:

  {
    "type": "datetime",
    "value": "2019-05-31 13:51:36" 
  } 

That could even go a step further, using a Json or XML column.

like image 44
zak Avatar answered Oct 12 '22 02:10

zak