We are in the process of migrating from MySQL to PGSQL and we have a 100 million row table.
When I was trying to ascertain how much space both systems use, I found much less difference for tables, but found huge differences for indexes.
MySQL indexes were occupying more size than the table data itself and postgres was using considerably lesser sizes.
When digging through for the reason, I found that MySQL uses B+ trees to store the indexes and postgres uses B-trees.
MySQL usage of indexes was a little different, it stores the data along with the indexes (due to which the increased size), but postgres doesn't.
Now the questions:
Comparing B-tree and B+ trees on database speak, it is better to use B+trees since they are better for range queries O(m) + O(logN) - where m in the range and lookup is logarithmic in B+trees?
Now in B-trees the lookup is logarithmic for range queries it shoots up to O(N) since it does not have the linked list underlying structure for the data nodes. With that said, why does postgres uses B-trees? Does it perform well for range queries (it does, but how does it handle internally with B-trees)?
The above question is from a postgres point of view, but from a MySQL perspective, why does it use more storage than postgres, what is the performance benefit of using B+trees in reality?
I could have missed/misunderstood many things, so please feel free to correct my understanding here.
Edit for answering Rick James questions
Additional Questions
PostgreSQL's use of B+ treesNormal ("btree" type) indexes in Postgres are not B+ trees. The distinction between B+ trees and B-trees is kind of nonsense for database indexes in the first place -- all the columns in the index itself *are* the lookup key, and they're the same on the leaf level as any other level.
A PostgreSQL Hash index can perform a faster lookup than a B-Tree index. However, the key downside of the Hash index is that its use is limited to equality operators that will perform matching operations.
PostgreSQL comes with no less than 6 different types of indexes, with the B-Tree index being the most commonly used.
B tree is a self-balancing tree, and it is a m-way tree where m defines the order of the tree. Btree is a generalization of the Binary Search tree in which a node can have more than one key and more than two children depending upon the value of m.
First, and foremost, if you are not using InnoDB, close this question, rebuild with InnoDB, then see if you need to re-open the question. MyISAM is not preferred and should not be discussed.
How did you build the indexes in MySQL? There are several ways to explicitly or implicitly build indexes; they lead to better or worse packing.
MySQL: Data and Indexes are stored in B+Trees composed of 16KB blocks.
MySQL: UNIQUE
indexes (including the PRIMARY KEY
) must be updated as you insert rows. So, a UNIQUE
index will necessarily have a lot of block splits, etc.
MySQL: The PRIMARY KEY
is clustered with the data, so it effectively takes zero space. If you load the data in PK order, then the block fragmentation is minimal.
Non-UNIQUE
secondary keys may be built on the fly, which leads to some fragmentation. Or they can be constructed after the table is loaded; this leads to denser packing.
Secondary keys (UNIQUE
or not) implicitly include the PRIMARY KEY
in them. If the PK is "large" then the secondary keys are bulky. What is your PK? Is this the 'answer'?
In theory, totally random inserts into a BTree lead to a the blocks being about 69% full. Maybe this is the answer. Is MySQL 45% bigger (1/69%)?
With 100M rows, probably many operations are I/O-bound because you don't have enough RAM to cache all the data and/or index blocks needed. If everything is cached, then B-Tree versus B+Tree won't make much difference. Let's analyze what needs to happen for a range query when things are not fully cached.
With either type of Tree, the operation starts with a drill-down in the Tree. For MySQL, 100M rows will have a B+Tree of about 4 levels deep. The 3 non-leaf nodes (again 16KB blocks) will be cached (if they weren't already) and be reused. Even for Postgres, this caching probably occurs. (I don't know Postgres.) Then the range scan starts. With MySQL it walks through the rest of the block. (Rule of Thumb: 100 rows in a block.) Ditto for Postgres?
At the end of the block something different has to happen. For MySQL, there is a link to the next block. That block (with 100 more rows) is fetched from disk (if not cached). For a B-Tree the non-leaf nodes need to be traversed again. 2, probably 3 levels are still cached. I would expect the need for another non-leaf node to be fetched from disk only 1/10K rows. (10K = 100*100) That is, Postgres might hit the disk 1% more often than MySQL, even on a "cold" system.
On the other hand, if the rows are so fat that only 1 or 2 can fit in a 16K block, the "100" I kept using is more like "2", and the 1% becomes maybe 50%. That is, if you have big rows this could be the "answer". Is it?
What is the block size in Postgres? Note that many of the computations above depend on the relative size between the block and the data. Could this be an answer?
Conclusion: I've given you 4 possible answers. Would you like to augment the question to confirm or refute that each of these apply? (Existence of secondary indexes, large PK, inefficient building of secondary indexes, large rows, block size, ...)
Addenda about PRIMARY KEY
For InnoDB, another thing to note... It is best to have a PRIMARY KEY
in the definition of the table before loading the data. It is also best to sort the data in PK order before LOAD DATA
. Without specifying any PRIMARY KEY
or UNIQUE
key, InnoDB builds a hidden 6-byte PK; this is usually sub-optimal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With