Considering my users can save data as "café" or "cafe", I need to be able to search on that fields with an accent-insensitive query.
I've found https://github.com/djcoin/django-unaccent/, but I have no idea if it is possible to implement something similar on sqlalchemy.
I'm using PostgreSQL, so if the solution is specific to this database is good to me. If it is generic solution, it is much much better.
Thanks for your help.
classsqlalchemy.orm. Query(entities, session=None)¶ ORM-level SQL construction object. Queryis the source of all SELECT statements generated by the ORM, both those formulated by end-user query operations as well as by high level internal operations such as related collection loading.
methodsqlalchemy.orm.Query.from_self(*entities)¶ return a Query that selects from this Query’s SELECT statement. Deprecated since version 1.4: The Query.from_self()method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0.
methodsqlalchemy.orm.Query.slice(start, stop)¶ Computes the “slice” of the Queryrepresented by the given indices and returns the resulting Query. The start and stop indices behave like the argument to Python’s built-in range()function.
Deprecated since version 1.4: The Query.from_self()method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0. The new approach is to use the aliased()construct in conjunction with a subquery.
First install the unaccess extension in PostgreSQL: create extension unaccent;
Next, declare the SQL function unaccent
in Python:
from sqlalchemy.sql.functions import ReturnTypeFromArgs
class unaccent(ReturnTypeFromArgs):
pass
and use it like this:
for place in session.query(Place).filter(unaccent(Place.name) == "cafe").all():
print place.name
Make sure you have the correct indexes if you have a large table, otherwise this will result in a full table scan.
A simple and database agnostic solution is to write the field(s) that can have accents twice, once with and once without accents. Then you can conduct your searches on the unaccented version.
To generate the unaccented vesrsion of a string you can use Unidecode.
To automatically assign the unaccented version to the database when a record is inserted or updated you can use the default
and onupdate
clauses in the Column
definition. For example, using Flask-SQLAlchemy you could do something like this:
from unidecode import unidecode
def unaccent(context):
return unidecode(context.current_parameters['some_string'])
class MyModel(db.Model):
id = Column(db.Integer, primary_key=True)
some_string = db.Column(db.String(128))
some_string_unaccented = db.Column(db.String(128), default=unaccent, onupdate=unaccent, index=True)
Note how I only indexed the unaccented field, because that is the one on which the searches will be made.
Of course before you can search you also have to unaccent the value you are searching for. For example:
def search(text):
return MyModel.query.filter_by(some_string_unaccented = unaccent(text)).all()
You can apply the same technique to full text search, if necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With