In a MySQL 5.6 database I have table tablename
which has (including others) three TEXT
columns: col_a, col_b, col_c
.
I want to extract all unique words (with words being separated by spaces) from these three columns that are at least 5 characters long. By "word" I mean any string of non-space characters, eg "foo-123" would be a word, as would "099423". The columns are all utf8 format InnoDB columns.
Is there a single query to do this?
EDIT: As requested, here's an example: (in the real data col_a, col_b and col_c are TEXT fields and could have a large number of words.)
select id, col_a, col_b, col_c from tablename;
id | col_a | col_b | col_c
----|--------------------|----------------|----------------------
1 | apple orange plum | red green blue | bill dave sue
2 | orange plum banana | yellow red | frank james
3 | kiwi fruit apple | green pink | bill sarah-jane frank
expected_result: ["apple", "orange", "banana", "fruit",
"green", "yellow", "frank", "james", "sarah-jane"]
I don't care about the order of results. thanks!
EDIT: in my example above, everything is in lowercase, as that's how I happen to store everything in my real-life table that this question relates to. But, for the sake of argument, if it did contain some capitalisation I would prefer the query to ignore capitalisation (this is the setting of my DB config as it happens).
EDIT2: in case it helps, all of the text columns have a FULLTEXT index on them.
EDIT3: here is the SQL to create the sample data:
DROP TABLE IF EXISTS `tablename`;
CREATE TABLE `tablename` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`col_a` text,
`col_b` text,
`col_c` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;
LOCK TABLES `tablename` WRITE;
INSERT INTO `tablename` VALUES (1,'apple orange plum','red green blue','bill dave sue'),(2,'orange plum banana','yellow red','frank james'),(3,'kiwi fruit apple','green pink','bill sarah-jane frank');
UNLOCK TABLES;
Select with distinct on all columns of the first query. Select with distinct on multiple columns and order by clause. Count() function and select with distinct on multiple columns.
Answer. Yes, the DISTINCT clause can be applied to any valid SELECT query. It is important to note that DISTINCT will filter out all rows that are not unique in terms of all selected columns. Feel free to test this out in the editor to see what happens!
To get unique or distinct values of a column in MySQL Table, use the following SQL Query. SELECT DISTINCT(column_name) FROM your_table_name; You can select distinct values for one or more columns. The column names has to be separated with comma.
The best solution is not using that structure to store data and normalize your database in compliance with normal forms. But if you want to split strings to words and get them as a table and you can't normalize the database and you can't use the latest version of MYSQL with CTE you could create a simple stored procedure to split strings and store them to a temporary table. For example, the stored procedure might look like:
DELIMITER //
CREATE PROCEDURE split_string_to_table (str longtext)
BEGIN
DECLARE val TEXT DEFAULT NULL;
DROP TEMPORARY TABLE IF EXISTS temp_values;
CREATE TEMPORARY TABLE temp_values (
`value` varchar(200)
);
iterator:
LOOP
IF LENGTH(TRIM(str)) = 0 OR str IS NULL THEN
LEAVE iterator;
END IF;
SET val = SUBSTRING_INDEX(str, ' ', 1);
INSERT INTO temp_values (`value`) VALUES (TRIM(val));
SET str = INSERT(str, 1, LENGTH(val) + 1, '');
END LOOP;
SELECT DISTINCT(`value`) FROM temp_values WHERE CHAR_LENGTH(`value`) >= 5;
END //
DELIMITER ;
After it, you can join all strings to one string and store it in a temporary variable and pass its value to the stored procedure:
SELECT CONCAT_WS(' ',
GROUP_CONCAT(col_a SEPARATOR ' '),
GROUP_CONCAT(col_b SEPARATOR ' '),
GROUP_CONCAT(col_c SEPARATOR ' ')
) INTO @text
FROM mytable;
CALL split_string_to_table(@text);
Result:
--------------
| value |
--------------
| apple |
--------------
| orange |
--------------
| banana |
--------------
| fruit |
--------------
| green |
--------------
| yellow |
--------------
| frank |
--------------
| james |
--------------
| sarah-jane |
--------------
You can see the demo of that realization in DBFiddle
From your performance requirements and comments, it appears that you need to run this query regularly. Unfortunately, your data just isn't at the right resolution to do this neatly or succinctly
I would consider adding a summary table of sorts to assist with the final query. By maintaining the summary table, as and when data in the main table changes, you should be able to keep things simpler
A suggested format for this summary table would be
id
, main_table_id
, column_name
, word
Where main_table_id
is a foreign key to your main table's id column
You could also place a composite unique index on (main_table_id, column_name, word)
On editing a relevant column value in the main table, you should adjust the summary table
main_table_id
and column_name
main_table_id
and column_name
This could either be done at the application level or using a trigger
This would make the final query much simpler..
SELECT DISTINCT word
FROM summary_table
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With