Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I resolve this with pure mysql? (joining on ';' separated values in a column)

Tags:

join

mysql

To make a long story short: I have data in several tables that I need to grab together, I have simplified them for the purpose of not having to draw a huge table.

I need to do so in one query, and I can't use PHP or any other langauge to work with the results. (I would have used PHP if I could to simply work my way around this)

This wouldn't have been an issue if I were to have a link table that connects the t1 rows to t2 but unfortunately I don't and can't introduce one either.

User table: (alias t1)
user(varchar 150),resources(varchar 250)
+-------+-------+
| user1 | 1;2;4 |
+-------+-------+
| user2 | 2     |
+-------+-------+
| user3 | 3;4   |
+-------+-------+

Resources table: (alias t2)
id(int 11 AI), data(text)
+---+-------+
| 1 | data1 |
+---+-------+
| 2 | data2 |
+---+-------+
| 3 | data3 |
+---+-------+
| 4 | data4 |
+---+-------+
| 5 | data5 |
+---+-------+

Multiple users can be connected to the same resources, and users can access one or more resources.

I would like a result close to:

user,data
+-------+-------+
| user1 | data1 |
+-------+-------+
| user1 | data2 |
+-------+-------+
| user1 | data4 |
+-------+-------+
| user2 | data2 |
+-------+-------+

....etc etc.

I have basic mysql knowledge but this one is out of my knowledge scope. Is there any way I can inner join t2 ?

Threads I've read before making this post: How to join two tables using a comma-separated-list in the join field

mysql join two table with comma separated ids

like image 574
jwebdev Avatar asked Jan 07 '23 04:01

jwebdev


1 Answers

If the user_resources (t1) was a 'normalized table' with one row for each user => resource combination then the query to get the answer would be as simple as just joining the tables together.

Alas, it is denormalized by having the resources column as a: 'list of resource id' separated by a ';' character.

If we could convert the 'resources' column into rows then a lot of the difficulties go away as the table joins become simple.

The query to generate the output asked for:

SELECT user_resource.user, 
       resource.data

FROM user_resource 
     JOIN integerseries AS isequence 
       ON isequence.id <= COUNT_IN_SET(user_resource.resources, ';') /* normalize */

     JOIN resource 
       ON resource.id = VALUE_IN_SET(user_resource.resources, ';', isequence.id)      
ORDER BY
       user_resource.user,  resource.data

The Output:

user        data    
----------  --------
sampleuser  abcde   
sampleuser  azerty  
sampleuser  qwerty  
stacky      qwerty  
testuser    abcde   
testuser    azerty  

How:

The 'trick' is to have a table that contains the numbers from 1 to some limit. I call it integerseries. It can be used to convert 'horizontal' things such as: ';' delimited strings into rows.

The way this works is that when you 'join' with integerseries, you are doing a cross join, which is what happens 'naturally' with 'inner joins'.

Each row gets duplicated with a different 'sequence number' from the integerseries table which we use as an 'index' of the 'resource' in the list that we want to use for that row.

The idea is to:

  • count the number of items in the list.
  • extract each item based on its the position in the list.
  • Use integerseries to convert one row into a set of rows extracting the individual 'resource id' from user.resources as we go along.

I decided to use two functions:

  • function that given a 'delimited string list' and an 'index' will return the value at the position in the list. I call it: VALUE_IN_SET. i.e. given 'A;B;C' and an 'index' of 2 then it returns 'B'.

  • function that given a 'delimited string list' will return the count of the number of items in the list. I call it: COUNT_IN_SET. i.e. given 'A;B;C' will return 3

It turns aout that those two functions and integerseries should provide a general solution to delimited items list in a column.

Does it work?

The query to create a 'normalized' table from a ';' delimited string in column. It shows all the columns, including the generated values due to the 'cross_join' (isequence.id as resources_index):

SELECT user_resource.user, 
       user_resource.resources,
       COUNT_IN_SET(user_resource.resources, ';')                AS resources_count, 
       isequence.id                                              AS resources_index,
       VALUE_IN_SET(user_resource.resources, ';', isequence.id)  AS resources_value
FROM 
     user_resource 
     JOIN  integerseries AS isequence 
       ON  isequence.id <= COUNT_IN_SET(user_resource.resources, ';')
ORDER BY
       user_resource.user, isequence.id

The 'normalized' table output:

user        resources  resources_count  resources_index  resources_value  
----------  ---------  ---------------  ---------------  -----------------
sampleuser  1;2;3                    3                1  1                
sampleuser  1;2;3                    3                2  2                
sampleuser  1;2;3                    3                3  3                
stacky      2                        1                1  2                
testuser    1;3                      2                1  1                
testuser    1;3                      2                2  3                

Using the above 'normalized' user_resources table, it is a simple join to provide the output required:

The functions needed (these are general functions that can be used anywhere)

note: The names of these functions are related to the mysql FIND_IN_SET function. i.e. they do similar things as regards string lists?

The COUNT_IN_SET function: returns the count of character delimited items in the column.

DELIMITER $$

DROP FUNCTION IF EXISTS `COUNT_IN_SET`$$

CREATE FUNCTION `COUNT_IN_SET`(haystack VARCHAR(1024), 
                               delim CHAR(1)
                               ) RETURNS INTEGER
BEGIN
      RETURN CHAR_LENGTH(haystack) - CHAR_LENGTH( REPLACE(haystack, delim, '')) + 1;
END$$

DELIMITER ;

The VALUE_IN_SET function: treats the delimited list as a one based array and returns the value at the given 'index'.

DELIMITER $$

DROP FUNCTION IF EXISTS `VALUE_IN_SET`$$

CREATE FUNCTION `VALUE_IN_SET`(haystack VARCHAR(1024), 
                               delim CHAR(1), 
                               which INTEGER
                               ) RETURNS VARCHAR(255) CHARSET utf8 COLLATE utf8_unicode_ci
BEGIN
      RETURN  SUBSTRING_INDEX(SUBSTRING_INDEX(haystack, delim, which),
                     delim,
                     -1);
END$$

DELIMITER ;

Related Information:

  • Finally worked out how to get SQLFiddle - working code to compile functions.

  • There is a version of this that works for SQLite databases as well SQLite- Normalizing a concatenated field and joining with it?

The tables (with data):

CREATE TABLE `integerseries` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=500 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

/*Data for the table `integerseries` */

insert  into `integerseries`(`id`) values (1);
insert  into `integerseries`(`id`) values (2);
insert  into `integerseries`(`id`) values (3);
insert  into `integerseries`(`id`) values (4);
insert  into `integerseries`(`id`) values (5);
insert  into `integerseries`(`id`) values (6);
insert  into `integerseries`(`id`) values (7);
insert  into `integerseries`(`id`) values (8);
insert  into `integerseries`(`id`) values (9);
insert  into `integerseries`(`id`) values (10);

Resource:

CREATE TABLE `resource` (
  `id` int(11) NOT NULL,
  `data` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

/*Data for the table `resource` */

insert  into `resource`(`id`,`data`) values (1,'abcde');
insert  into `resource`(`id`,`data`) values (2,'qwerty');
insert  into `resource`(`id`,`data`) values (3,'azerty');

User_resource:

CREATE TABLE `user_resource` (
  `user` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `resources` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`user`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

/*Data for the table `user_resource` */

insert  into `user_resource`(`user`,`resources`) values ('sampleuser','1;2;3');
insert  into `user_resource`(`user`,`resources`) values ('stacky','3');
insert  into `user_resource`(`user`,`resources`) values ('testuser','1;3');
like image 120
Ryan Vincent Avatar answered Jan 18 '23 23:01

Ryan Vincent