Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case sensitive uniqueness and case insensitive search

I have a table with a field a using encoding utf8 and collation utf8_unicode_ci:

CREATE TABLE dictionary (
    a varchar(128) NOT NULL
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

The collation utf8_unicode_ci is required for an efficient case insensitive search with extensions and ligations. For this purpose i have the index:

CREATE INDEX a_idx on dictionary(a);

Problem: Additionally i must ensure that all stored values of the field a are unique but in a case sensitive way. German example: "blühen" and "Blühen" must both be stored in the table. But adding "Blühen" a second time should not be possible.

Is there a build-in functionality in MySQL to have both?

Unfortunately it seems not to be possible to set the collation for the index in MySQL 5.1.

Solutions to this problem include a uniqueness check before insert or a trigger. Both are far less elegant than using a unique index.

like image 713
user1091141 Avatar asked Jan 02 '12 15:01

user1091141


2 Answers

Well, there are 2 ways to accomplish this:

  1. using _bin collation
  2. change your datatype to VARBINARY

Case 1: using _bin collation

Create your table as follows:

CREATE TABLE `dictionary` (
 `a` VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
 UNIQUE KEY `idx_un_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Please note:

  1. the datatype of the column a
  2. the UNIQUE index on column a

Case 2: using VARBINARY dataype

Create your table as follows:

CREATE TABLE `dictionary` (
 `a` VARBINARY(128) NOT NULL,
 UNIQUE KEY `idx_uniq_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Please note:

  1. the new datatype VARBINARY
  2. the UNIQUE index on column a

So, both the above will solve your purpose. That is, they both will allow values like 'abc', 'Abc', 'ABC', 'aBc' etc but not allow the same value again if the case matches.

Please note that giving an "_bin" collation is different than using the binary datatype. So please feel free to refer to the following links:

  1. The BINARY and VARBINARY datatypes
  2. The _bin and binary Collations

I hope the above helps!

like image 53
Abhay Avatar answered Nov 15 '22 19:11

Abhay


You can achieve this by adding additinal column 'column_lower'.

CREATE TABLE `dictionary` (
  `a` VARCHAR(128) NOT NULL,
  `a_lower` VARCHAR(128) NOT NULL,
  UNIQUE KEY `idx_un_a_lower` (`a_lower`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Insert that goes like this:

insert into dictionary set a = x, a_lower = lower(x);

Select can now be case-insensitive:

select * from dictionary where a_lower like lower('search_term%')

Note that column which has index on it, can store at max 191 characters. MySQL can have at max 767 bytes long index, that is 767 / 4 (unicode can take up to 4 bytes if you use utf8mb4 collation) = 191.75 = 191 characters. If you use utf8 collation that takes up at max 3 bytes per character column can store at max 767 / 3 = 255 characters.

like image 20
broadband Avatar answered Nov 15 '22 19:11

broadband