I read that
SELECT
is a horizontal partition of the relation into two set of tuples.
and
PROJECT
is a vertical partition of the relation into two relations.
However, I don't understand what that means. Can you explain it in layman's terms?
Horizontal partitioning involves putting different rows into different tables. For example, customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest.
Horizontal vs. Horizontal partitioning means that all rows matching the partitioning function will be assigned to different physical partitions. Vertical partitioning allows different table columns to be split into different physical partitions. Currently, MySQL supports horizontal partitioning but not vertical.
Like horizontal partitioning, vertical partitioning lets queries scan less data. This increases query performance. For example, a table that contains seven columns of which only the first four are generally referenced may benefit from splitting the last three columns into a separate table.
Not a complete answer to the question but it answers what is asked in the question title. So the general meaning of horizontal and vertical database partitioning is:
Horizontal partitioning involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.
Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.
See more details here.
A projection creates a subset of attributes in a relation hence a "vertical partition"
A selection creates a subset of the tuples in a relation hence a "horizontal partition"
Given a table (r)
as
a : b : c : d : e
-----------------
1 : 2 : 3 : 4 : 5
1 : 2 : 3 : 4 : 5
2 : 2 : 3 : 4 : 5
2 : 2 : 3 : 4 : 5
An expression such as
PROJECT a, b (SELECT a=1 (r))
-- SELECT a, b FROM r WHERE a=1
Would "do"
a : b | c : d : e
-----------------
1 : 2 | 3 : 4 : 5
1 : 2 | 3 : 4 : 5
================= < -- horizontal partition (by SELECTION)
2 : 2 | 3 : 4 : 5
2 : 2 | 3 : 4 : 5
^ -- vertical partition (by PROJECTION)
Resulting in
a : b
------
1 : 2
1 : 2
Necromancing.
I think the existing answers are too abstract.
So here my attempts at a more practical explanation:
Partitioning form a developer's point of view is all about performance.
More exactly, it's about what happens when you have large amounts of data in your tables, and you still want to query the data fast.
Here some excerpts from slides by Bill Karwin about what exactly horizontal partitioning is all about:
The above is bad, because:
HORIZONTAL PARTITONING
Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows.
The difference: Query Performance and simplicity
"Tribbles" can also accumulate in columns. Example:
The solution to that problem is VERTICAL PARTITIONING
Proper normalization is ONE form of vertical partitioning
To quote technet
Vertical partitioning divides a table into multiple tables that contain fewer columns.
The two types of vertical partitioning are normalization and row splitting:
Normalization is the standard database process of removing redundant columns from a table and putting them in secondary tables that are linked to the primary table by primary key and foreign key relationships.
Row splitting divides the original table vertically into tables with fewer columns. Each logical row in a split table matches the same logical row in the other tables as identified by a UNIQUE KEY column that is identical in all of the partitioned tables. For example, joining the row with ID 712 from each split table re-creates the original row. Like horizontal partitioning, vertical partitioning lets queries scan less data. This increases query performance. For example, a table that contains seven columns of which only the first four are generally referenced may benefit from splitting the last three columns into a separate table. Vertical partitioning should be considered carefully, because analyzing data from multiple partitions requires queries that join the tables.
Vertical partitioning also could affect performance if partitions are very large.
That sums it up nicely.
This SO post describes the difference as such:
Select Operation : This operation is used to select rows from a table (relation) that specifies a given logic, which is called as a
predicate
. The predicate is a user defined condition to select rows of user's choice.Project Operation : If the user is interested in selecting the values of a few attributes, rather than selection all attributes of the Table (Relation), then one should go for
PROJECT
Operation.
SELECT is an actual SQL operation (statement), while PROJECT is a term used in relational algebra.
Judging from you posting this on SO and not on MathOverflow, I would suggest you don't read relational algebra books if you just want to learn SQL for developing applications.
If you are in dire need of a recommendation for a good book about (advanced) SQL, here is one
SQL Antipatterns: Avoiding the Pitfalls of Database Programming
Bill Karwin
ISBN-13: 978-1934356555
ISBN-10: 1934356557
That's the one book about SQL worth reading.
Most other books about SQL that I've seen out there can be summed up by this cynical statement about photoshop books:
There are more books about photoshop than people actually using photoshop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With