Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL select and group records with same words in different order

I have the following values in a table:

Table Fruits

id | name                        | price
-----------------------------------------
 1 | 'APPLE BANANA ORANGE'       | 12.00
 2 | 'BANANA ORANGE APPLE'       |  4.00
 3 | 'ORANGE APPLE BANANA'       | 10.00
 4 | 'LEMON APPLE BANANA ORANGE' |  7.00
 5 | 'APPLE LEMON BANANA ORANGE' |  8.00

I want to select all the values of the 3 first rows and all I have is a text containing 'APPLE BANANA ORANGE'

Example

SELECT * 
FROM fruits 
WHERE name IN 
('APPLE BANANA ORANGE','BANANA ORANGE APPLE','ORANGE APPLE BANANA')

The thing is that those values comes from another table and i'm looking a way to generate the different combinations of possible values

Maybe using a function:

SELECT * 
FROM fruits f 
INNER JOIN order o ON o.name IN some_function(f.name)

Maybe using some regex:

SELECT * 
FROM fruits f 
INNER JOIN order o ON o.name ~ '(?=' || f.name || ')'

I've tried to use some regex found over the internet that uses lookaround and they bring all the values that contains those three words but includes the one that also have more words. Like 'LEMON APPLE BANANA ORANGE' and 'APPLE LEMON BANANA ORANGE'

The other thing is that I want to group those results no matter the order that they appear like

SELECT sum(price) 
FROM fruits f 
INNER JOIN order o ON o.name ~ '(?=' || f.name || ')'
GROUP BY somefunction(name);

Thanks for you help

Marcel Isaac

UPDATE

I've created a function that splits the content of the string, orders the words and return a new string with the ordered words

CREATE OR REPLACE FUNCTION order_words(txt character varying) 
returns character varying as
$BODY$
DECLARE 
    tmp character varying;
BEGIN
    SELECT string_agg(t,' ') 
    INTO tmp
    FROM (
      SELECT rstt 
      FROM regexp_split_to_table(txt, ' ') rstt 
      ORDER BY rstt) t; 

    RETURN tmp;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

When executed on the values it always returns the same string

select order_words('APPLE BANANA ORANGE');
    order_words     
---------------------
APPLE BANANA ORANGE
(1 row)

select order_words('BANANA ORANGE APPLE');
    order_words     
---------------------
APPLE BANANA ORANGE
(1 row)

select order_words('ORANGE APPLE BANANA');
    order_words     
---------------------
APPLE BANANA ORANGE
(1 row)

Now I can write my code

SELECT order_words(name),sum(price) 
FROM fruits f 
INNER JOIN order o ON order_words(o.name) = order_words(f.name)
GROUP BY order_words(name);

I'll test for performance

like image 205
Marcel Isaac Avatar asked Nov 09 '22 13:11

Marcel Isaac


1 Answers

Not sure if this would be faster than your solution:

select f.*
from fruits f
  join orders o 
    on string_to_array(f.name, ' ') @> string_to_array(o.name, ' ')
   and cardinality(string_to_array(f.name, ' ')) = cardinality(string_to_array(o.name, ' '));

The idea is to split both values into array and check if they overlap. But because "overlaps" doesn't mean all elements are equal, I'm comparing the length of the array as well. If they are of equal length and overlap all elements are equal.

The expression string_to_array(f.name, ' ') can be indexed and a GIN index can even be used by the @> operator.

like image 86
a_horse_with_no_name Avatar answered Nov 14 '22 21:11

a_horse_with_no_name