Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLite FTS4 with preferred language

I have an SQLite table that was generated by using the FTS4 module. Each entry is listed at least twice with different languages, but still sharing a unique ID (int column, not indexed). Here is what I want to do: I want to lookup a term in a preferred language. I want to union the result with a lookup for the same term using another language. For the second lookup though, I want to ignore all entries (identified by their ID) that I already found during the first lookup. So basically I want to do this:

WITH term_search1 AS (
    SELECT *
    FROM myFts
    WHERE myFts MATCH 'term'
    AND languageId = 1)
SELECT *
FROM term_search1
UNION
SELECT *
FROM myFts
WHERE myFts MATCH 'term'
AND languageId = 2
AND id NOT IN (SELECT id FROM term_search1)

The problem here is, that the term_seach1 Query would be executed twice. Is there a way of materializing my results maybe? Any solution for limiting it to 2 Queries (instead of 3) would be great.

I also tried using recursive Queries, something like:

WITH RECURSIVE term_search1 AS (
    SELECT *
    FROM myFts
    WHERE myFts MATCH 'term'
    AND languageId = 1
UNION ALL
    SELECT m.*
    FROM myFts m LEFT OUTER JOIN term_search1 t ON (m.id = t.id)
    WHERE myFts MATCH 'term'
    AND m.languageId = 2
    AND t.id IS NULL
)
SELECT * FROM term_search1

This didn't work neither. Apparently he just executed two lookups for languageId = 2 (is this a bug maybe?).

Thanks in advance :)

like image 360
Peach Avatar asked Mar 17 '15 12:03

Peach


2 Answers

You can use TEMPORARY tables to reduce the number of queries to myFts to 2:

CREATE TEMP TABLE results (id INTEGER PRIMARY KEY);

INSERT INTO results 
    SELECT id FROM myFts
    WHERE myFts MATCH 'term' AND languageId = 1;

INSERT INTO results
    SELECT id FROM myFts
    WHERE myFts MATCH 'term' AND languageId = 2
    AND id NOT IN (SELECT id FROM results);

SELECT * FROM myFts
    WHERE id IN (SELECT id FROM results);

DROP TABLE results;

If it's possible to change the schema, you should only keep text data in the FTS table. This way you will avoid incorrect results when you are searching for numbers and rows matching languageId is not desired. Create another meta table holding non-textual data (like id and languageId) and filter the rows by joining against the rowid of the myFts. This way you will need to query the FTS table only once - use the temporary table to store the FTS table results then use the meta table to order them.

like image 194
Paras Avatar answered Oct 10 '22 05:10

Paras


This is the best I can think of :

SELECT *
FROM myFts t1
JOIN (SELECT COUNT(*) AS cnt, id 
      FROM myFts t2
      WHERE t2.languageId in (1, 2) 
      AND t2.myFts MATCH 'term'
      GROUP BY t2.id) t3
ON t1.id = t3.id
WHERE t1.myFts MATCH 'term'
    AND t1.languageId in (1, 2) 
    AND (t1.languageId = 1 or t3.cnt = 1)

I am not sure if the second MATCH clause is necessary. The idea is to first count the acceptable rows, then choose the best one.

Edit : I have no idea why it does not work with your table. This is what I did to test it (SQLite version 3.8.10.2):

CREATE VIRTUAL TABLE myFts USING fts4(
  id integer,
  languageId integer,
  content TEXT
);

insert into myFts(id, languageId, content) values (10, 1, 'term 10 lang 1');
insert into myFts(id, languageId, content) values (10, 2, 'term 10 lang 2');
insert into myFts(id, languageId, content) values (11, 1, 'term 11 lang 1');
insert into myFts(id, languageId, content) values (12, 2, 'term 12 lang 2');
insert into myFts(id, languageId, content) values (13, 1, 'not_erm 13 lang 1');
insert into myFts(id, languageId, content) values (13, 2, 'term 13 lang 2');

executing the query gives :

sqlite> SELECT *
   ...> FROM myFts t1
   ...> JOIN (SELECT COUNT(*) AS cnt, id 
   ...>       FROM myFts t2
   ...>       WHERE t2.languageId in (1, 2) 
   ...>       AND t2.myFts MATCH 'term'
   ...>       GROUP BY t2.id) t3
   ...> ON t1.id = t3.id
   ...> WHERE t1.myFts MATCH 'term'
   ...>     AND t1.languageId in (1, 2) 
   ...>     AND (t1.languageId = 1 or t3.cnt = 1);
10|1|term 10 lang 1|2|10
11|1|term 11 lang 1|1|11
12|2|term 12 lang 2|1|12
13|2|term 13 lang 2|1|13
sqlite> 
like image 43
bwt Avatar answered Oct 10 '22 03:10

bwt