Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper use of MySQL full text search with SQLAlchemy

I would like to be able to full text search across several text fields of one of my SQLAlchemy mapped objects. I would also like my mapped object to support foreign keys and transactions.

I plan to use MySQL to run the full text search. However, I understand that MySQL can only run full text search on a MyISAM table, which does not support transactions and foreign keys.

In order to accomplish my objective I plan to create two tables. My code will look something like this:

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String(50))
    description = Column(Text)

users_myisam = Table('users_myisam', Base.metadata,
                     Column('id', Integer),
                     Column('name', String(50)),
                     Column('description', Text),
                     mysql_engine='MyISAM')

conn = Base.metadata.bind.connect()
conn.execute("CREATE FULLTEXT INDEX idx_users_ftxt \
              on users_myisam (name, description)")

Then, to search I will run this:

q = 'monkey'
ft_search = users_myisam.select("MATCH (name,description) AGAINST ('%s')" % q)
result = ft_search.execute()
for row in result: print row

This seems to work, but I have a few questions:

  1. Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?

  2. Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?

  3. Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?

  4. Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?

  5. In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?

Thanks, Michael

like image 510
mickeybob Avatar asked Feb 20 '13 03:02

mickeybob


People also ask

How does MySQL full-text search work?

The basic query format of full-text searches in MySQL should be similar to the following: SELECT * FROM table WHERE MATCH(column) AGAINST(“string” IN NATURAL LANGUAGE MODE); When MATCH() is used together with a WHERE clause, the rows are automatically sorted by the highest relevance first.

Can you use SQLAlchemy with MySQL?

SQLAlchemy supports MySQL starting with version 5.0. 2 through modern releases, as well as all modern versions of MariaDB.

How do you perform a full-text case sensitive search in MySQL?

To perform a case-sensitive full-text search, use a case-sensitive or binary collation for the indexed columns. For example, a column that uses the utf8mb4 character set of can be assigned a collation of utf8mb4_0900_as_cs or utf8mb4_bin to make it case-sensitive for full-text searches.

Which DB is used in SQLAlchemy?

Supported Databases. SQLAlchemy includes dialects for SQLite, Postgresql, MySQL, Oracle, MS-SQL, Firebird, Sybase and others, most of which support multiple DBAPIs.


1 Answers

Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?

I've not seen this use case attempted before, as developers who value transactions and constraints tend to use Postgresql in the first place. I understand that may not be possible in your specific scenario.

Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?

conn.execute() is fine though if you want something slightly more integrated you can use the DDL() construct, read through http://docs.sqlalchemy.org/en/rel_0_8/core/schema.html?highlight=ddl#customizing-ddl for details

Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?

note: this recipe is only for MATCH against multiple columns simultaneously - if you have just one column, use the match() operator more simply.

most basically you could use the text() construct:

from sqlalchemy import text, bindparam

users_myisam.select(
  text("MATCH (name,description) AGAINST (:value)", 
       bindparams=[bindparam('value', q)])
)

more comprehensively you could define a custom construct:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import ClauseElement
from sqlalchemy import literal

class Match(ClauseElement):
    def __init__(self, columns, value):
        self.columns = columns
        self.value = literal(value)

@compiles(Match)
def _match(element, compiler, **kw):
    return "MATCH (%s) AGAINST (%s)" % (
               ", ".join(compiler.process(c, **kw) for c in element.columns),
               compiler.process(element.value)
             )

my_table.select(Match([my_table.c.a, my_table.c.b], "some value"))

docs:

http://docs.sqlalchemy.org/en/rel_0_8/core/compiler.html

Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?

you should probably create a UserMyISAM class, map it just like User, then use relationship() to link the two classes together, then simple operations like this are possible:

query(User).join(User.search_table).\
           filter(Match([UserSearch.x, UserSearch.y], "some value"))

In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?

MapperExtensions are deprecated, so you'd at least use the event API, and in most cases we want to try applying object mutations outside of the flush process. In this case, I'd be using the constructor for User, or alternatively the init event, as well as a basic @validates decorator which will receive values for the target attributes on User and copy those values into User.search_table.

Overall, if you've been learning SQLAlchemy from another source (like the Oreilly book), its really out of date by many years, and I'd be focusing on the current online documentation.

like image 109
zzzeek Avatar answered Sep 23 '22 03:09

zzzeek