Postgres hstore for time series

Tags:

I am new to postgres and am experimenting with the hstore extension.Looking for some guidance. I need to support basic reporting on timeseries data for various products that we sell. I have a large amount data in the format "Timestamp, Value" for each product. This data is available in a csv fle for each product.

I am thinking of using hstore to store this data in the key value format. Assuming that all the timeseries data for a single product can be stored in a single hstore object. I need to be able to query this data by specific times, say what was the value of a product at a given time? Also need to run simple queries like retrieving the times where the product costed more than $100. I'm planning to have a table with a product id column and an hstore column. But I am not very clear on how to make this work:

The hstore column needs to be loaded from thousands of timestamp,value records that exist in a csv. The hstore should be appended whenever we get a new csv.
The table needs to store the productId and corresponding Timeseries data. Can you please advise if using hstore would be helpful ? If yes then how can I load data from csv as explained above. Also, if there could be any impact on the performance on inserts/updates in the hstore, as data grows please share your experiences.

976

asked Nov 13 '12 21:11

zing

1 Answers

I do think you should start with a simple, normalised schema first, especially since you are new to PostgreSQL. Something like:

CREATE TABLE product_data
(
    product TEXT,  -- I'm making an assumption about the types of your columns
    time TIMESTAMP,
    value DOUBLE PRECISION,

    PRIMARY KEY (product, time);
);

I would definitely keep hstore and similar options in mind, if and when your data becomes large enough that efficiency is more important and simplicity. But note that all options have an efficiency tradeoff.

Do you know how much data you're going to support? Number of products, number of distinct timestamps for each product?

What other queries do you want to run? A query for the times where a single product cost more than $100 would benefit from an index on (product, value), if the product has many distinct timestamps.

Other options

hstore is most useful if you want to store a table set of arbitrary key-value pairs in a row. You could use it here, with a row for each product, and each distinct timestamp for that product being a key in the product's table. The downsides are that keys and values in hstore are text, whereas your keys are timestamps, and your values are numbers of some kind. So there will be a certain reduction in type checking, and a certain increase in type casting cost required. Another possible downside is that some queries on the hstore might not use indexes very efficiently. The above table can use simple btree indexes for range queries (say you want to pull out the values between two dates for a product). But hstore indexes are much more limited; you can use a gist or gin index on an hstore column to find all the rows that feature a certain key.

Another option (which I've played with and use experimentally for some of my databases) is arrays. Basically, each product will have an array of values, and each timestamp maps to an index in the array. This is easy if the timestamps are perfectly regular. For example, if all your products had a value every hour for every day, you could use a table like this:

CREATE TABLE product_data
(
    product TEXT,
    day DATE,
    values DOUBLE PRECISION[], -- An array from 0 to 23.

    PRIMARY KEY (product, day);
);

You can construct views and indexes to make querying this table moderate easy. (I wrote a blog post on this technique at http://ejrh.wordpress.com/2011/03/20/vector-denormalisation-in-postgresql/.)

But my advice is still: start with a simple table, then explore ways to improve efficiency when you know you're going to need them.

200

answered Dec 31 '22 19:12

Edmund

Related questions
                            
                                How to define a Django model field with a PostgreSQL function as default value
                            
                                Is nginx / node.js / postgres a very scalable architecture?
                            
                                PostgreSQL Dollar-Quoted Strings Constants to Prevent SQL Injection
                            
                                What's the best way in Postgres to store a bunch of arbitrary boolean values for a row?
                            
                                SQL: filtering rows
                            
                                How do you access database error information when using Rails and Postgres
                            
                                Postgres ERROR:duplicate key value violates unique constraint
                            
                                php postgresql pdo copy from stdin
                            
                                How to disable the Rails association counter_cache at runtime
                            
                                Postgresql replication in rails with data-fabric gem
                            
                                ORM for postgresql/node.js
                            
                                PostgreSQL: find number of consecutive days up until now
                            
                                JDBC setMaxRows database usage
                            
                                Why is row count 0 in my PostgreSQL plan?
                            
                                Postgres sh: /usr/local/var/postgres/server.log: No such file or directory
                            
                                Compare text values in PostgreSQL
                            
                                Doctrine and Postgres arrays
                            
                                limitation of array_agg in Postgresql 9.1
                            
                                How to index JSON data in PostgreSQL 9.2?
                            
                                Query using ILIKE with IN

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Postgres hstore for time series

Tags:

postgresql

hstore

zing

People also ask

1 Answers

Edmund

Recent Activity

Donate For Us