Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL to calculate the Tanimoto Coefficient of several vectors

I think it's easier to explain my problem with an example.

I have one table with ingredients for recipes and I have implemented a function to calculate the Tanimoto coefficient between ingredients. It's fast enough to calculate the coefficient between two ingredients (3 sql queries needed), but it does not scale well. To calculate the coefficient betweeen all possible ingredient's combination it needs N + (N*(N-1))/2 queries or 500500 queries for just 1k ingredients. Is there a faster way to do that? Here's what I got so far:

class Filtering():
  def __init__(self):
    self._connection=sqlite.connect('database.db')

  def n_recipes(self, ingredient_id):
    cursor = self._connection.cursor()
    cursor.execute('''select count(recipe_id) from recipe_ingredient
        where ingredient_id = ? ''', (ingredient_id, ))
    return cursor.fetchone()[0]

  def n_recipes_intersection(self, ingredient_a, ingredient_b):
    cursor = self._connection.cursor()
    cursor.execute('''select count(drink_id) from recipe_ingredient where
        ingredient_id = ? and recipe_id in (
        select recipe_id from recipe_ingredient
        where ingredient_id = ?) ''', (ingredient_a, ingredient_b))
    return cursor.fetchone()[0]

  def tanimoto(self, ingredient_a, ingredient_b):
    n_a, n_b = map(self.n_recipes, (ingredient_a, ingredient_b))
    n_ab = self.n_recipes_intersection(ingredient_a, ingredient_b)
    return float(n_ab) / (n_a + n_b - n_ab)
like image 859
jbochi Avatar asked Feb 27 '23 20:02

jbochi


1 Answers

Why aren't you simply fetching all recipes into memory and then computing Tanimoto coefficients in memory?

It's simpler and it's much, much faster.

like image 67
S.Lott Avatar answered Mar 02 '23 13:03

S.Lott