I have an n:n set of data (for example, 'programmers' and 'languages'. Programmers write code in many languages, and a language can be used by many programmers). This data is in a table programmers_languages
How do I quickly select programmers who code in all of a set of languages?
More information if this is confusing:
Jon codes in C++, Pascal, and Ruby. Joe codes in C++ and Ruby. Moe codes in Ruby and Pascal. Steve Codes in C++ and Pascal.
If the set of languages in question is C++ and Pascal, I would want Jon and Steve from this list.
Note the size of this set can get pretty large, so I don't want to join the table to itself n times.
Note the size of this set can get pretty large, so I don't want to join the table to itself n times.
Any way you shake it, there's going to be a join for each language. You're looking for a value (programmer) for which there exists at least one row for each of another value (language). That means that you need to think about N different perspectives of the same table.
In most cases, it's probably most efficient for you to just do the joins. If the result set is sufficiently dense (really, most programmers speak python and c++), you could resort to some cleverness. First query the disjunction, but uniquely, then group the resulting relation by programmer and filter out the ones that speak too few languages...
SELECT programmer
FROM ( SELECT DISTINCT programmer, language
FROM speaks_table
WHERE language in ('C++', 'python') ) AS disjunction
GROUP BY disjunction.programmer
HAVING count(disjunction.language) = 2
But wether this outperforms a regular ol' multiway join is going to depend on the exact data in question. This at least has the advantage of not requiring generative queries depending on the number of languages in question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With