Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if `LIKE` patterns intersects in Postgres

There ara two strings in some request that are patterns that used within LIKE expressions (with _ and % placeholders). I want to find if this patterns intersects (have some string that matches them both). Is there any way to do that?...


“Like pattern” corresponds to finit or infinit set of strings. Each string in this set matches to given pattern. I want to check if intersection of string sets for two given patterns is not empty. Thus it is better to say patterns conjunction. In a math language:

S — set of strings
P — set of patterns (where each pattern has one or more string representation)

Sᵢ — subset of strings (Sᵢ ⊂ S) that match pᵢ pattern (where instead of i could be any index). In equation form: “Sᵢ = {s | s ∈ S, s matches pᵢ, pᵢ ∈ P}” — that meas: “Sᵢ is a set of elements that are strings and match pᵢ pattern”. Or another notation: “Sᵢ ⊂ S, ∀pᵢ ∈ P ∀s ∈ S (s matches pᵢ ≡ s ∈ Sᵢ)” — that meas: “Sᵢ is subset of strings and any string is element of Sᵢ if it matches pᵢ pattern”.

Let's define conjunction of patterns: “p₁ ∧ p₂ = p₃ ≡ S₁ ∩ S₂ = S₃” — that means: “Set of strings that match conjunction of patterns p₁ and p₂ is intersection of sets of strings that match p₁ pattern and that match p₂ pattern”.


For example:

  • ab_d and %cd — intersects
  • k%n and kl___ — intersects
like image 795
Timofey Gorshkov Avatar asked Nov 13 '22 07:11

Timofey Gorshkov


1 Answers

I want to find if this patterns intersects (have some string that matches them both). Is there any way to do that?... (...) I want to check if intersection of string sets for two given patterns is not empty.

So, if I get this right, given two like patterns, p1 and p2, you're interested in whether there exists a (yet to be determined) string that matches p1 as well as p2.

E.g.:

select check_pattern('a%', 'b_'); -- false
select check_pattern('a%', '_b'); -- true ('ab')

Are you even sure there's a general solution to that problem in the first place?

Assuming there is, plain SQL isn't the right tool to find the solution imho, because you cannot readily express this in terms of "here's my (finite) set of data, join/filter them and yield a set based on it". To find the solution in SQL terms, you'd need to generate the set that stems from your data, and that's obviously not an option when the set in question is infinite.

Methinks you'd want to break up the problem into smaller parts and use a procedure language such as C, Perl, Lisp, whatever you fancy.

One potential solution might be this:

  • If both p1 and p2 are open on both ends or different ends, the answer is trivially yes: strings matching %foo% will intersect with those matching %bar%, just as strings matching foo% will intersect strings matching %bar.

  • If p1 yields a finite set (i.e. it contains no %), you could imagine iterating the entire set of potential matches for p1 using generate_series() or a for/while/whatever loop, and trying p2 on each string. It's ugly and inefficient, but it'll eventually work.

  • If p1 and p2 are both anchored (e.g. abc% and def% or %abc and %def), or reasonably anchored (e.g. _abc% and abcd%) the solution is trivial enough as well by considering the anchored part and proceeding as in the prior case.

  • I'll leave it to you to enumerate and solve the remaining cases if any...

The key, I think, will be to nail down the anchored parts of your patterns that yield a finite set of strings, and to stick to checking whether the (finite) set of strings they will match will intersect.

like image 146
Denis de Bernardy Avatar answered Nov 15 '22 07:11

Denis de Bernardy