Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Autocomplete using Hibernate Search

I am trying to build a better autocomplete feature for my website. I want to use Hibernate Search for this but as far as I experimented it only finds full words for me.

So, my question: is it possible to search for some characters only ?

eg. user types 3 letters and using hibernate search to show him all words of my db objects which contains those 3 letter?

PS. right now I am using a "like" query for this...but my db grown a lot and I want also to extend the search functionality over another tables...

like image 755
Rem Ma Avatar asked Mar 19 '11 12:03

Rem Ma


2 Answers

Major edit One year on and I was able to improve on the original code I posted to produce this:

My indexed entity:

@Entity
@Indexed
@AnalyzerDef(name = "myanalyzer",
// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class), //
filters = { //
// Normalize token text to lowercase, as the user is unlikely to care about casing when searching for matches
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
// Index partial words starting at the front, so we can provide Autocomplete functionality
@TokenFilterDef(factory = NGramFilterFactory.class, params = { @Parameter(name = "maxGramSize", value = "1024") }),
// Close filters & Analyzerdef
})
@Analyzer(definition = "myanalyzer")
public class Compound extends DomainObject {
public static String[] getSearchFields(){...}
...
}

All @Fields are tokenized and stored in the index; required for this to work:
@Field(index = Index.TOKENIZED, store = Store.YES)

@Transactional(readOnly = true)
public synchronized List<String> getSuggestions(final String searchTerm) {
    // Compose query for term over all fields in Compound
    String lowerCasedSearchTerm = searchTerm.toLowerCase();

    // Create a fullTextSession for the sessionFactory.getCurrentSession()
    FullTextSession fullTextSession = Search.getFullTextSession(getSession());

    // New DSL based query composition
    SearchFactory searchFactory = fullTextSession.getSearchFactory();
    QueryBuilder buildQuery = searchFactory.buildQueryBuilder().forEntity(Compound.class).get();
    TermContext keyword = buildQuery.keyword();
    WildcardContext wildcard = keyword.wildcard();
    String[] searchfields = Compound.getSearchfields();
    TermMatchingContext onFields = wildcard.onField(searchfields[0]);
    for (int i = 1; i < searchfields.length; i++)
        onFields.andField(searchfields[i]);
    TermTermination matching = onFields.matching(input.toLowerCase());
    Query query = matching.createQuery();

    // Convert the Search Query into something that provides results: Specify Compound again to be future proof
    FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(query, Compound.class);
    fullTextQuery.setMaxResults(20);

    // Projection does not work on collections or maps which are indexed via @IndexedEmbedded
    List<String> projectedFields = new ArrayList<String>();
    projectedFields.add(ProjectionConstants.DOCUMENT);
    List<String> embeddedFields = new ArrayList<String>();
    for (String fieldName : searchfields)
        if (fieldName.contains("."))
            embeddedFields.add(fieldName);
        else
            projectedFields.add(fieldName);

    @SuppressWarnings("unchecked")
    List<Object[]> results = fullTextQuery.setProjection(projectedFields.toArray(new String[projectedFields.size()])).list();

    // Keep a list of suggestions retrieved by search over all fields
    List<String> suggestions = new ArrayList<String>();
    for (Object[] projectedObjects : results) {
        // Retrieve the search suggestions for the simple projected field values
        for (int i = 1; i < projectedObjects.length; i++) {
            String fieldValue = projectedObjects[i].toString();
            if (fieldValue.toLowerCase().contains(lowerCasedSearchTerm))
                suggestions.add(fieldValue);
        }

        // Extract the search suggestions for the embedded fields from the document
        Document document = (Document) projectedObjects[0];
        for (String fieldName : embeddedFields)
            for (Field field : document.getFields(fieldName))
                if (field.stringValue().toLowerCase().contains(lowerCasedSearchTerm))
                    suggestions.add(field.stringValue());
    }

    // Return the composed list of suggestions, which might be empty
    return suggestions;
}

There's some wrangling I'm doing at the end to handle @IndexedEmbedded fields. If you dont have those you can simplify the code a whole lot merely projecting the searchFields, and leaving out the document & embeddedField handling.

As before: Hopefully this is useful to the next person to encounter this question. Should anyone have any critique or improvements to the above posted code, feel free to edit and do please let me know.


Edit3: The project this code was taken from has since been open sourced; Here are the relevant classes:

https://trac.nbic.nl/metidb/browser/trunk/metidb/metidb-core/src/main/java/org/metidb/domain/Compound.java
https://trac.nbic.nl/metidb/browser/trunk/metidb/metidb-core/src/main/java/org/metidb/dao/CompoundDAOImpl.java
https://trac.nbic.nl/metidb/browser/trunk/metidb/metidb-search/src/main/java/org/metidb/search/text/Autocompleter.java

like image 198
Tim Avatar answered Oct 11 '22 10:10

Tim


You could index the field using an NGramFilter as suggested here. For best results you should use the EdgeNgramFilter from Apache Solr that creates ngrams from the beginning edge of a term and can be used in hibernate search as well.

like image 41
Thomas Avatar answered Oct 11 '22 11:10

Thomas