Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GAE Search API Now support Partial Searching

Since the Fall update, GAE now supports partial searching. Per the documentation: "The API supports partial text matching on string fields".

This seems to be a very popular request, per many threads: Partial matching GAE search API Does GAE Datastore support 'partial text search'?

So I would assume a search for 'pyt' would now return 'python'

Has anyone gotten this to work? Doesn't work for me. I'm curious if there's some setting required, like the ~ for stemming.

like image 233
Nick Caruso Avatar asked Jan 19 '14 20:01

Nick Caruso


1 Answers

"The API supports partial text matching on string fields" in https://cloud.google.com/appengine/docs/python/search/ refers to matching by tokens. Specifically, see https://cloud.google.com/appengine/docs/python/search/#Python_Tokenizing_string_fields ...:

The string is split into tokens wherever whitespace or special characters (punctuation marks, hash sign, etc.) appear. The index will include an entry for each token. This enables you to search for keywords and phrases comprising only part of a field's value.

Therefore your assumption:

So I would assume a search for 'pyt' would now return 'python'

is ill-founded -- "partial search" means parts of a document (a subset of the tokens in a text field thereof), not parts of each token (that would cause a combinatorial explosion, e.g the single token python would have to be indexed as each and every one of the entries:

p
py
pyt
pyth
pytho
python
y
yt
yth
ytho
ython
t
th
tho
thon
h
ho
hon
o
on
n

If you want that, it's easy enough to write your own code to produce the explosion (producing a pseudo-document with all of these substrings from a real starting document) -- but, for any non-trivial starting document, you may easily end up either paying for a ridiculous amount of resources, or hitting a hard ceiling of absolute maximum quotas.

Hint: if you do a web search for "pyt", do you find docs containing "python"? Try -- the former gives 10 million hits (Peninsula Youth Theater, Michael Jackson's P.Y.T. (Pretty Young Thing), etc etc), the latter, 180 million hits (the language, the snake, the comedy group:-).

like image 172
Alex Martelli Avatar answered Oct 06 '22 00:10

Alex Martelli