Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is upper casing not enough for case-insensitive comparison?

To compare two strings case insensitively, one correct way is to case fold them first. How is this better than upper casing or lower casing?

I find examples where lower casing doesn't work right online. For example "σ" and "ς" (two forms of "Σ") don't become the same when converted to lower case. But I've failed to find why case folding is better than mapping to upper case. Is there a case where two strings that should match case insensitively don't upper case to the same strings?

Another scenario is when I want to store a case insensitive index. The recommended way seems to be case folding and then normalizing. What are its advantages over storing the string mapped to upper case and normalized? The specs say mapping to upper case is not guaranteed to be stable across versions of Unicode while case folding is. But are there any cases where mapping to upper case gives a different string in an earlier version of Unicode?

like image 447
93Iq2Gg2cZtLMO Avatar asked Apr 15 '21 10:04

93Iq2Gg2cZtLMO


People also ask

How do you perform a case-insensitive comparison of two strings?

The most basic way to do case insensitive string comparison in JavaScript is using either the toLowerCase() or toUpperCase() method to make sure both strings are either all lowercase or all uppercase.

What is case-insensitive comparison?

Comparing strings in a case insensitive manner means to compare them without taking care of the uppercase and lowercase letters.

Why is converting strings to lowercase a helpful way to compare strings?

Some upper case characters doesn't have an equivalent lower case character, so making them lower case would convert them into a different lower case character. That could cause a false positive in the comparison.

How do you make a case-insensitive?

The equalsIgnoreCase() method compares two strings, ignoring lower case and upper case differences. This method returns true if the strings are equal, and false if not.

How to compare strings in a case insensitive way?

Comparing strings in a case insensitive manner means to compare them without taking care of the uppercase and lowercase letters. To perform this operation the most preferred method is to use either toUpperCase () or toLowerCase () function.

How to use the uppercase function in a case insensitive SQL query?

Here’s how you use the uppercase function with a SQL LIKE query: select * from users where upper(first_name) like '%AL%'; and here’s the same case insensitive SQL LIKE query using the SQL lowercase function: select * from users where lower(first_name) like '%al%'; Summary. I hope these case insensitive SQL SELECT query examples are helpful.

What does it mean when a string is case sensitive?

Case-insensitive means the string which you are comparing should exactly be the same as a string which is to be compared but both strings can be either in upper case or lower case. (ie., different cases) Example 1: Conversion to lower case for comparison

What does it mean to perform a case-insensitive check?

That means we should perform a case-insensitive check. Case-insensitive means the string which you are comparing should exactly be the same as a string which is to be compared but both strings can be either in upper case or lower case. (ie., different cases)


Video Answer


1 Answers

As per Unicode stability policy, case mappings are only stable for case pairs, i.e. pairs of characters X and Y where X is the full uppercase mapping of Y, and Y is the full lowercase mapping of X. Only when both these characters exist with these properties is the casing relation between them set in stone.

However, Unicode contains many “incomplete” case pairs where only the lowercase form has been encoded and the uppercase form is missing completely. This is usually the case for letters used in transcription systems that are traditionally lowercase-only. Should capital forms be discovered and subsequently added to Unicode, these letters would then receive a new uppercase mapping.

The most recent characters this has happened to are “ʂ” (from Unicode 1.1), “ᶎ” (from Unicode 4.1), and “ꞔ” (from Unicode 7.0), which all got brand new uppercase forms (Ꞔ, Ʂ, Ᶎ) in Unicode 12.0 two years ago.

Because case mappings do not have to be unique, this makes uppercasing a poor substitute for proper case-folding. For example, both U+0434 (д) and U+1C81 (ᲁ) uppercase to U+0414 (Д), but only the former is locked into a case pair by virtue of being U+0414’s full lowercase mapping. If someone were to find a dedicated capital letter version of U+1C81 in some old manuscript, it would be given a new uppercase mapping, resulting in U+0434 and U+1C81 suddenly no longer comparing equal under that operation.

EDIT: I have just remembered a current example of uppercasing not being sufficient for case-insensitive matching: U+1E9E (ẞ) is already a capital letter and thus uppercases to itself. Its lowercase counterpart is U+00DF (ß), but the uppercase mapping of U+00DF is the sequence <U+0053, U+0053> (SS).

uppercase("ẞ") ≠ uppercase(lowercase("ẞ"))
like image 131
CharlotteBuff Avatar answered Oct 18 '22 20:10

CharlotteBuff