Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL - Select where first character is lowercase or uppercase

Sorry if this is something that was asked before, but I am not able to find the answer to this. How do I create a MySQL select statement to select all strings from the database whose first characters is either in lowercase or uppercase.

I'd normally use something like this to select 10 random words from English Word Database with length of more than 8 characters:

SELECT word FROM en_dictionary WHERE CHAR_LENGTH(word)>8 ORDER BY RAND() LIMIT 10

Unfortunately, this also selects the words in Capital letters that I don't want in the results, as they are usually the terms such as names of cities, species etc.:

enter image description here

So, my question is: Is there a way to select only the words that start with the lowercase alphabet characters (or uppercase)?

I could do something like this and it'd work, but it'd be a very ugly statement:

AND word LIKE BINARY 'a%' AND word LIKE BINARY 'b%' AND word LIKE BINARY 'c%'...
like image 401
jjj Avatar asked Oct 30 '18 11:10

jjj


2 Answers

You can use Ascii() function. It returns the numeric value of the leftmost character of the input string.

For lowercase first character: ASCII code of a is 97 and z is 122. So, the ASCII code of the first character should be Between 97 and 122.

SELECT word 
FROM en_dictionary 
WHERE CHAR_LENGTH(word) > 8 
  AND ASCII(word) BETWEEN 97 AND 122
ORDER BY RAND() LIMIT 10

For uppercase first character: ASCII code of A is 65 and Z is 90. So, the ASCII code of the first character should be Between 65 and 90.

SELECT word 
FROM en_dictionary 
WHERE CHAR_LENGTH(word) > 8 
  AND ASCII(word) BETWEEN 65 AND 90
ORDER BY RAND() LIMIT 10
like image 195
Madhur Bhaiya Avatar answered Oct 12 '22 23:10

Madhur Bhaiya


Not a total answer, but way to large for a comment so i made a answer out of it.

I advice you to use a case sensitive utf8 collate like utf8_bin.
Then you don't need to use BINARY or other MySQL "hacks" to get the results you need you can simply use LIKE/BETWEEN with case sensitivity without problems.

Create table

CREATE TABLE test (

    utf8_general_ci_word VARCHAR(255) COLLATE utf8_general_ci
  , utf8_bin_word VARCHAR(255) COLLATE utf8_bin
);

INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('a', 'a');
INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('A', 'A');

INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('b', 'b');
INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('B', 'B');

Results

Query #1

    SELECT 
     utf8_general_ci_word
    FROM
     test 
    WHERE
     utf8_general_ci_word LIKE 'a%'

    ;

| utf8_general_ci_word |
| -------------------- |
| a                    |
| A                    |

Query #2

    SELECT 
     utf8_general_ci_word
    FROM
     test 
    WHERE
     utf8_general_ci_word LIKE BINARY 'a%'

    ;

| utf8_general_ci_word |
| -------------------- |
| a                    |

Query #3

    SELECT 
     utf8_bin_word
    FROM
     test 
    WHERE
     utf8_bin_word LIKE 'a%';

| utf8_bin_word |
| ------------- |
| a             |

demo

So your query would be something like

Query

SELECT
  word
FROM
  en_dictionary
WHERE
    CHAR_LENGTH(word) > 8
  AND
    word BETWEEN 'a' AND 'z'
ORDER BY
 RAND()
LIMIT 10

demo on the test data

Edited on 08-11-2019

This will miss words like 'zahn' because it's not <= 'z', use 'zzzzzzz' instead

Thanks to dnoeth 's comment and a very late sparring session later on 07-11-2019 there is indeed a bug in the SQL above for some cases..

As word BETWEEN 'a' AND 'z' is syntax sugar for word >= 'a' AND word <= 'z' this will indeed not match words greater then z like zahn or zzzzzzzzz for that matter. A valid patch for the query above is to use REPEAT('<char>', <number_of_size_of_table_column_datatype>) as the max range meaning in this case word BETWEEN 'a' AND REPEAT('z', 255)

the demo of the patch..

like image 39
Raymond Nijland Avatar answered Oct 12 '22 23:10

Raymond Nijland