Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I select noncontiguous characters from a string of text in MySQL?

I have a table with millions of rows and a single column of text that is exactly 11,159 characters long. It looks like this:

1202012101...(to 11,159 characters)
1202020120...
0121210212...
...
(to millions of rows)

I realize that I can use

SELECT SUBSTR(column,2,4) FROM table;

...if I wanted to pull out characters 2, 3, 4, and 5:

1202012101...
1202020120...
0121210212...
 ^^^^

But I need to extract noncontiguous characters, e.g. characters 1,5,7:

1202012101...
1202020120...
0121210212...
^   ^ ^

I realize this can be done with a query like:

SELECT CONCAT(SUBSTR(colm,1,1),SUBSTR(colm,5,1),SUBSTR(colm,7,1)) FROM table;

But this query gets very unwieldy to build for thousands of characters that I need to select. So for the first part of the question - how do I build a query that does something like this:

SELECT CHARACTERS(string,1,5,7) FROM table;

Furthermore, the indices of the characters I want to select are from a different table that looks something like this:

char_index   keep_or_discard
1            keep
2            discard
3            discard
4            discard
5            keep
7            discard
8            keep
9            discard
10           discard

So for the second part of the question, how could I build a query to select specific characters from the first table based on whether keep_or_discard="keep" for that character's index in the second table?

like image 643
Stephen Turner Avatar asked Nov 13 '22 21:11

Stephen Turner


1 Answers

this function does what you want:

CREATE DEFINER = `root`@`localhost` FUNCTION `test`.`getsubset`(selection mediumtext, longstring mediumtext)
RETURNS varchar(200)
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT 'This function returns a subset of characters.'
BEGIN
  SET @res:='';
  SET @selection:=selection;
  WHILE @selection<>'' DO
    set @pos:=CONVERT(@selection, signed);
    set @res := concat_ws('',@res,SUBSTRING(longstring,@pos,1));
    IF LOCATE(',',@selection)=0 THEN 
       SET @selection:='';
    END IF;
    set @selection:=SUBSTRING(@selection,LOCATE(',',@selection)+1);
  END WHILE;
  RETURN @res;
END

Note: the CONVERT('1,2,3,4',signed) will yield 1, but it will give a warning.

I have it defined to be available in the database test.

The function takes two parameters; a string(!) with a list of positions, and a long string from where you want the characters taken.

An example of using this:

mysql> select * from keepdiscard;
+---------+------------+
| charind | keepordisc |
+---------+------------+
|       1 | keep       |
|       2 | discard    |
|       3 | keep       |
|       4 | discard    |
|       5 | keep       |
|       6 | keep       |
+---------+------------+
6 rows in set (0.00 sec)

mysql> select * from test;
+-------------------+
| longstring        |
+-------------------+
| abcdefghijklmnopq |
| 123456789         |
+-------------------+
2 rows in set (0.00 sec)

mysql> select getsubset(group_concat(charind ORDER BY charind),longstring) as result from keepdiscard, test  where keepordisc='keep' group by longstring;
+--------+
| result |
+--------+
| 1356   |
| acef   |
+--------+
2 rows in set, 6 warnings (0.00 sec)

The warnings stem from the fast conversion to integer that is done in the function. (See comment above)

like image 168
Eljakim Avatar answered Nov 17 '22 07:11

Eljakim