Sorry if this is something that was asked before, but I am not able to find the answer to this. How do I create a MySQL select statement to select all strings from the database whose first characters is either in lowercase or uppercase.
I'd normally use something like this to select 10 random words from English Word Database with length of more than 8 characters:
SELECT word FROM en_dictionary WHERE CHAR_LENGTH(word)>8 ORDER BY RAND() LIMIT 10
Unfortunately, this also selects the words in Capital letters that I don't want in the results, as they are usually the terms such as names of cities, species etc.:
So, my question is: Is there a way to select only the words that start with the lowercase alphabet characters (or uppercase)?
I could do something like this and it'd work, but it'd be a very ugly statement:
AND word LIKE BINARY 'a%' AND word LIKE BINARY 'b%' AND word LIKE BINARY 'c%'...
You can use Ascii()
function. It returns the numeric value of the leftmost character of the input string.
For lowercase first character: ASCII code of a
is 97 and z
is 122. So, the ASCII code of the first character should be Between 97 and 122.
SELECT word
FROM en_dictionary
WHERE CHAR_LENGTH(word) > 8
AND ASCII(word) BETWEEN 97 AND 122
ORDER BY RAND() LIMIT 10
For uppercase first character: ASCII code of A
is 65 and Z
is 90. So, the ASCII code of the first character should be Between 65 and 90.
SELECT word
FROM en_dictionary
WHERE CHAR_LENGTH(word) > 8
AND ASCII(word) BETWEEN 65 AND 90
ORDER BY RAND() LIMIT 10
Not a total answer, but way to large for a comment so i made a answer out of it.
I advice you to use a case sensitive utf8 collate like utf8_bin
.
Then you don't need to use BINARY or other MySQL "hacks" to get the results you need you can simply use LIKE/BETWEEN with case sensitivity without problems.
Create table
CREATE TABLE test (
utf8_general_ci_word VARCHAR(255) COLLATE utf8_general_ci
, utf8_bin_word VARCHAR(255) COLLATE utf8_bin
);
INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('a', 'a');
INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('A', 'A');
INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('b', 'b');
INSERT INTO test (utf8_general_ci_word, utf8_bin_word) VALUES('B', 'B');
Results
Query #1
SELECT
utf8_general_ci_word
FROM
test
WHERE
utf8_general_ci_word LIKE 'a%'
;
| utf8_general_ci_word |
| -------------------- |
| a |
| A |
Query #2
SELECT
utf8_general_ci_word
FROM
test
WHERE
utf8_general_ci_word LIKE BINARY 'a%'
;
| utf8_general_ci_word |
| -------------------- |
| a |
Query #3
SELECT
utf8_bin_word
FROM
test
WHERE
utf8_bin_word LIKE 'a%';
| utf8_bin_word |
| ------------- |
| a |
demo
So your query would be something like
Query
SELECT
word
FROM
en_dictionary
WHERE
CHAR_LENGTH(word) > 8
AND
word BETWEEN 'a' AND 'z'
ORDER BY
RAND()
LIMIT 10
demo on the test data
Edited on 08-11-2019
This will miss words like 'zahn' because it's not <= 'z', use 'zzzzzzz' instead
Thanks to dnoeth 's comment and a very late sparring session later on 07-11-2019 there is indeed a bug in the SQL above for some cases..
As word BETWEEN 'a' AND 'z'
is syntax sugar for word >= 'a' AND word <= 'z'
this will indeed not match words greater then z
like zahn
or zzzzzzzzz
for that matter.
A valid patch for the query above is to use REPEAT('<char>', <number_of_size_of_table_column_datatype>)
as the max range meaning in this case word BETWEEN 'a' AND REPEAT('z', 255)
the demo of the patch..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With