Are there any open source projects in the works to create a column-oriented storage engine for PostgreSQL? I know Yahoo created one in-house, and that there are commercial products built on PostgreSQL that are column-oriented.
Postgres is a full featured open source DB that has both traditional row based storage (sometimes called “heapfiles”) as well as a columnar store extension (cstore_fdw).
As of 9.4, PostgreSQL is a row-oriented database server. The project and requirements described here have the objective of implementing performance improvements using column storage techniques.
PostgreSQL has one storage engine; MySQL has nine, but only two of those really matter to most users: MyIsam and InnoDB. MyIsam was the original engine, built for speed, but it lacked transactions; InnoDB has transactions and is speedier than MyIsam, which is why it's the default storage engine.
In row-oriented databases such as PostgreSQL, data pages (or blocks) store the values of an entire row together, sequentially for each column in the row.
Citus Data has developed an open source columnar store extension for PostgreSQL. It is available under the Apache License v2.0. It supports PostgreSQL 9.3 and higher.
First, creation the extension and a foreign server:
CREATE EXTENSION cstore_fdw; CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
Next, create some foreign tables:
CREATE FOREIGN TABLE customer_reviews ( customer_id TEXT, review_date DATE, review_rating INTEGER, review_votes INTEGER, review_helpful_votes INTEGER, product_id CHAR(10), product_title TEXT, product_sales_rank BIGINT, product_group TEXT, product_category TEXT, product_subcategory TEXT, similar_product_ids CHAR(10)[] ) SERVER cstore_server OPTIONS(filename '/opt/citusdb/3.0/cstore/customer_reviews.cstore', compression 'pglz');
Finally, COPY
data into the table:
COPY customer_reviews FROM '/home/user/customer_reviews_1998.csv' WITH CSV;
Foreign tables can be queried like any other table. You can even join them with regular tables.
More examples and information are available in a related blog post and the project's home page.
The lack of responses here and my own research seems to indicate that there are indeed no open source initiatives to add column storage to PostgreSQL.
There was some talk in 2008 about Yahoo possibly outsourcing Everest (their column store back end for PostgreSQL), so here's hoping that they'll release it.
Greenplum has created a column-oriented storage engine for PostgreSQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With