SQL to calculate the Tanimoto Coefficient of several vectors

Question

I think it's easier to explain my problem with an example.

I have one table with ingredients for recipes and I have implemented a function to calculate the Tanimoto coefficient between ingredients. It's fast enough to calculate the coefficient between two ingredients (3 sql queries needed), but it does not scale well. To calculate the coefficient betweeen all possible ingredient's combination it needs N + (N*(N-1))/2 queries or 500500 queries for just 1k ingredients. Is there a faster way to do that? Here's what I got so far:

class Filtering():
  def __init__(self):
    self._connection=sqlite.connect('database.db')

  def n_recipes(self, ingredient_id):
    cursor = self._connection.cursor()
    cursor.execute('''select count(recipe_id) from recipe_ingredient
        where ingredient_id = ? ''', (ingredient_id, ))
    return cursor.fetchone()[0]

  def n_recipes_intersection(self, ingredient_a, ingredient_b):
    cursor = self._connection.cursor()
    cursor.execute('''select count(drink_id) from recipe_ingredient where
        ingredient_id = ? and recipe_id in (
        select recipe_id from recipe_ingredient
        where ingredient_id = ?) ''', (ingredient_a, ingredient_b))
    return cursor.fetchone()[0]

  def tanimoto(self, ingredient_a, ingredient_b):
    n_a, n_b = map(self.n_recipes, (ingredient_a, ingredient_b))
    n_ab = self.n_recipes_intersection(ingredient_a, ingredient_b)
    return float(n_ab) / (n_a + n_b - n_ab)

S.Lott · Accepted Answer

Why aren't you simply fetching all recipes into memory and then computing Tanimoto coefficients in memory?

It's simpler and it's much, much faster.

SQL to calculate the Tanimoto Coefficient of several vectors

Tags:

python

sql

collaborative-filtering

jbochi

1 Answers

S.Lott

Recent Activity

Donate For Us

SQL to calculate the Tanimoto Coefficient of several vectors

Tags:

python

sql

collaborative-filtering

jbochi

1 Answers

S.Lott

Related questions

Recent Activity

Donate For Us