Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the definition of cardinality in SQL

My school book Database Systems defines cardinality as follows:

The cardinality of a relation is the number of tuples it contains. By contrast, the number of tuples is called the cardinality of the relation and this changes as tuples are added or deleted. High-cardinality - many tuples, low-cardinality - few tuples.

While the Wikipedia article on Cardinality (SQL statements), defines it as follows:

Cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table. The lower the cardinality, the more duplicated elements in a column. There are 3 types of cardinality: high-cardinality, normal-cardinality, and low-cardinality.

They might both be right, but i can't connect the two definitions as related definitions. A rephrase would be appriciated!

like image 533
Tobias Schultz Avatar asked Aug 28 '14 11:08

Tobias Schultz


People also ask

What is the correct definition of cardinality?

The term cardinality refers to the number of cardinal (basic) members in a set. Cardinality can be finite (a non-negative integer) or infinite. For example, the cardinality of the set of people in the United States is approximately 270,000,000; the cardinality of the set of integers is denumerably infinite.

How do you show cardinality in SQL?

There is way to calculate cardinality in SQL statements which is : Select distinct Count(column_name) from Tablename; Definition of Database Cardinality for SQL Statements : The database cardinality is nothing but the uniqueness of values in SQL tables which helps to determine Query plan for performance optimization.

What is cardinality and degree in SQL?

Answer: Cardinality refers to the number of tuples/rows in a table whereas, Degree refers to the number of attributes/columns in a table.


3 Answers

They are speaking the same thing and it has to do with tuples (relational algebra) or rows (layman's term).

When it says high-cardinality are possible values of particular attribute (or field) that are unique and therefore the number of rows or tuples are higher:

Example:

 StudentID   Lastname Firstname  Gender
 101         Smith    John       M
 102         Jones    James      M
 103         Mayo     Ann        F
 104         Jones    George     M
 105         Smith    Suse       F

As far as as StudentID the cardinality is high because it is unique. In this it has five (5) tuples/rows.

On the other hand Lastname has normal cardinality, in particular there are only three (3) unique tuples/rows. Thus it has normal cardinality.

And finally Gender has only two possible unique tuples thus Low Cardinality.

You probably confuse Cardinality here with Degree of a relation which has something to do of the number of attributes/fields in a relation (or table).

On the other hand the textbook for Database when speaking of Cardinality normally has to do with an entity in relation to another entity, that is, the number of possible relations occurences for an entity participating in a given relationship type. Thus for example for a binary relationship cardinality could be either one-to-one, one-to-many or many-to-many.

like image 137
Edper Avatar answered Oct 31 '22 18:10

Edper


Both definitions are trying to say that cardinality is the "number of rows". The difference is whether the comparison is "in the table" or "in a particular column".

The version in your database text book focuses on relational algebra and the structure of tables ("relations" in that lingo).

The Wikipedia entry is more practical. It encompasses the textbook definition, assuming the table has a primary key (the cardinality of the primary key is the same as the table). However, it can be applied to, say, a flag column as well. If the flag only takes on two values (0 versus 1), then we can say that the cardinality of the column is 2.

This is important for optimizing queries. Cardinality is one component of choosing the best methods for joining, aggregating, and selecting data. In practice, most databases use more information than the cardinality, so-called "statistics" about columns and their values for optimization.

like image 38
Gordon Linoff Avatar answered Oct 31 '22 17:10

Gordon Linoff


There are two concepts

  1. Index cardinality
  2. Cardinality

This I believe is referring to index cardinality, which is VERY different https://www.ibm.com/developerworks/data/library/techarticle/dm-1309cardinal/

  • Index cardinality is considered the number of unique values in the index
  • The term is used to discuss creating indexes, table scans, index access vs table access, how it affects inserts, updates, deletes, storage space

Here is another example, https://en.wikipedia.org/wiki/Cardinality_(SQL_statements)

  • In SQL (Structured Query Language), the term cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table.
  • The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row. SQL databases use cardinality to help determine the optimal query plan for a given query.

Just the word cardinality, I believe focuses on relationships between tables In particular, it is not a term used to discuss a single table or uniqueness of data

IBM documentation (if you search for the word unique, it is not mentioned) https://www.ibm.com/support/knowledgecenter/en/SSEP7J_10.2.2/com.ibm.swg.ba.cognos.ug_cog_rlp.10.2.2.doc/c_cog_rlp_rel_cardinality.html When you interpret cardinality, you must consider the notation that displays at both ends of the relationship. Possible end labels are shown in the following list:

  • 0..1 (zero or one match)
  • 1..1 (exactly one match)
  • 0..n (zero or more matches)
  • 1..n (one or more matches)

In mathematics, the cardinality of a set is a measure of the "number of elements of the set". (no mention of unique btw) https://en.wikipedia.org/wiki/Cardinality

In database design, the cardinality or fundamental principle of one data aspect with respect to another is a critical feature. The relationship of one to the other must be precise and exact between each other in order to explain how each aspect links together. In the relational model, tables can be related as any of "one-to-many", "many-to-many" "one-to-zero-or-one", etc.. This is said to be the cardinality of a given table in relation to another. https://en.wikipedia.org/wiki/Cardinality_(data_modeling)

like image 30
VAI Jason Avatar answered Oct 31 '22 18:10

VAI Jason