ios app compile sqlite fts with icu,but it cant get the perfect answer when i input a letter like "z"

Question

In sqlite I:

Perform a create virtual MyTable (tokenize =icu ,id text,subject text,abstract text)
Then successfully insert info MyTable (id,subject,abstract) values (?,?,?) so I have the row：今天天气不错fmowomrogmeog，wfomgomrg，我是谁erz

When I perform select id from MyTable where MyTable match ‘z*’ it does not return anything，Whenever I search the single letter it returns nothing. However if I search ‘m’ or ‘天气’ or ‘天’，it works.

I know sqlite only support prefix, so I am using ICU. Am I making a mistake?

Note I've looked at the source code on foxmail,it looks to me like I can search ',' 'f' and so on.

Hai Feng Kao · Accepted Answer

Try Hai Feng Kao's character tokenizer. It can search prefix, postfix and anything in between. It supports Chinese as well. I don't think you can find any other tokenizers which support arbitrarily substring search.

BTW, it is a shameless self-promotion.

If you want to open a database encoded by character tokenizer in Objective-C, do the following:

#import <FMDB/FMDatabase.h>
#import "character_tokenizer.h"

FMDatabase* database = [[FMDatabase alloc] initWithPath:@"my_database.db"];
if ([database open]) {
    // add FTS support
    const sqlite3_tokenizer_module *ptr;
    get_character_tokenizer_module(&ptr);
    registerTokenizer(database.sqliteHandle, "character", ptr);
}

Qiulang 邱朗 · Answer

You may also try FMDB's FMSimpleTokenizer. FMSimpleTokenizer uses build-in CFStringTokenizer and according to apple document "CFStringTokenizer allows you to tokenize strings into words, sentences or paragraphs in a language-neutral way. It supports languages such as Japanese and Chinese that do not delimit words by spaces"

If you check FMSimpleTokenizer code, you will find that is done by calling CFStringTokenizerAdvanceToNextToken & CFStringTokenizerGetCurrentTokenRange.

One interesting "fact" is how CFStringTokenizer tokenizes the Chinese words, for example "欢迎使用" will be tokenize into "欢迎" & "使用", which totally makes sense, but if you search "迎", you will be surprised to see no result at all!

In that case you probably need to write a tokenizer like Hai Feng Kao's sqlite tokenizer.

ios app compile sqlite fts with icu,but it cant get the perfect answer when i input a letter like "z"

Tags:

sqlite

ios

icu

user1243169

2 Answers

Hai Feng Kao

Qiulang 邱朗

Recent Activity

Donate For Us

ios app compile sqlite fts with icu,but it cant get the perfect answer when i input a letter like "z"

Tags:

sqlite

ios

icu

user1243169

2 Answers

Hai Feng Kao

Qiulang 邱朗

Related questions

Recent Activity

Donate For Us