Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does ArangoDB have faceted search?

Tags:

arangodb

Does anyone know whether ArangoDB supports faceted search and how performance compares to other products that support it well (e.g., Solr, MarkLogic) or those that don't (e.g., Mongo)?

After searching the site, reading the docs, and searching the Google group, I don't see it discussed anywhere.

Thanks

like image 963
user2029783 Avatar asked Mar 13 '14 11:03

user2029783


People also ask

What is a database faceted search?

Faceted search, or faceted navigation, is a way of browsing and searching for items in a set of data by applying filters on various properties (facets) of the items in the collection.

How do you do a faceted search?

Use customer language: When implementing faceted search, think about what your users are looking for and use only the facets they are in interested in. Include number of matches: Include the number of matches for each facet to give your customers insight into your product offerings.

What is a faceted search interface?

Faceted search, also known as guided navigation or faceted navigation, is a way to add specific, relevant options to your results pages so that when your users search for a product, they can see where in your catalogue they've ended up.

How does ArangoDB store data?

ArangoDB stores documents in collections. The collection data is persisted on disk so it does not get lost in case of a server restart. When a collection gets created (either explicitly or by inserting the first document into it), a separate directory is created for the collection on disk.


1 Answers

ArangoDB has a query language, which supports group-by like queries. That allows you to implement a faceted search. To be certain we have the same understanding of faceted searching, just let me explain, what I think is meant by it. You have a - for example - a list of products. Each product has some attributes (e.g. name, model) and some categories (e.g. manufacturer). I can then search for a name or a name containing a word. This will list all products plus an indication how many products are in which category. Is that what you meant?

So for examples: Assume you have documents which have three attributes (name, attribute1, attribute2) and two categories (category1, category2):

> for (i = 0; i < 10000; i++) db.products.save({category1: i % 5, category2: i % 7, attribute1: i % 13, attribute2: i % 17, name: "Lore Ipsum " + i, productId: i})

so a typical document is:

> db.products.any()
{
  "_id" : "products/8788564659",
  "_rev" : "8788564659",
  "_key" : "8788564659",
  "productId" : 9291,
  "category1" : 1,
  "category2" : 2,
  "attribute1" : 9,
  "attribute2" : 9,
  "name" : "Lore Ipsum 9291"
}

If you want to search for all documents that have attribute1 between 2 and 3 (inclusive), you could use

> db._query("FOR p IN products FILTER p.attribute1 >= 2 && p.attribute1 <= 3 SORT p.name LIMIT 3 RETURN p").toArray();
[
  {
    "_id" : "products/7159077555",
    "_rev" : "7159077555",
    "_key" : "7159077555",
    "productId" : 1003,
    "category1" : 3,
    "category2" : 2,
    "attribute1" : 2,
    "attribute2" : 0,
    "name" : "Lore Ipsum 1003"
  },
  {
    "_id" : "products/7159274163",
    "_rev" : "7159274163",
    "_key" : "7159274163",
    "productId" : 1004,
    "category1" : 4,
    "category2" : 3,
    "attribute1" : 3,
    "attribute2" : 1,
    "name" : "Lore Ipsum 1004"
  },
  {
    "_id" : "products/7161633459",
    "_rev" : "7161633459",
    "_key" : "7161633459",
    "productId" : 1016,
    "category1" : 1,
    "category2" : 1,
    "attribute1" : 2,
    "attribute2" : 13,
    "name" : "Lore Ipsum 1016"
  }
]

or if you are only interested in the product identifies

> db._query("FOR p IN products FILTER p.attribute1 >= 2 && p.attribute1 <= 3 SORT p.name LIMIT 3 RETURN p.productId").toArray();
[
  1003,
  1004,
  1016
]

Now to get the facets say for category1

>  db._query("LET l = (FOR p IN products FILTER p.attribute1 >= 2 && p.attribute1 <= 3 SORT p.name RETURN p) return [ slice(l,@skip,@count), (FOR p in l collect c1 = p.category1 INTO g return { category1: c1, count: length(g[*].p)}) ]", { skip: 0, count: 3 }).toArray()
[
  [
    [
      {
        "_id" : "products/7159077555",
        "_rev" : "7159077555",
        "_key" : "7159077555",
        "productId" : 1003,
        "category1" : 3,
        "category2" : 2,
        "attribute1" : 2,
        "attribute2" : 0,
        "name" : "Lore Ipsum 1003"
      },
      {
        "_id" : "products/7159274163",
        "_rev" : "7159274163",
        "_key" : "7159274163",
        "productId" : 1004,
        "category1" : 4,
        "category2" : 3,
        "attribute1" : 3,
        "attribute2" : 1,
        "name" : "Lore Ipsum 1004"
      },
      {
        "_id" : "products/7161633459",
        "_rev" : "7161633459",
        "_key" : "7161633459",
        "productId" : 1016,
        "category1" : 1,
        "category2" : 1,
        "attribute1" : 2,
        "attribute2" : 13,
        "name" : "Lore Ipsum 1016"
      }
    ],
    [
      {
        "category1" : 0,
        "count" : 307
      },
      {
        "category1" : 1,
        "count" : 308
      },
      {
        "category1" : 2,
        "count" : 308
      },
      {
        "category1" : 3,
        "count" : 308
      },
      {
        "category1" : 4,
        "count" : 308
      }
    ]
  ]
]

To drill down to category1 and use the facets for category2:

>  db._query("LET l = (FOR p IN products FILTER p.attribute1 >= 2 && p.attribute1 <= 3 && p.category1 == 1 SORT p.name RETURN p) return [ slice(l,@skip,@count), (FOR p in l collect c2 = p.category2 INTO g return { category2: c2, count: length(g[*].p)}) ]", { skip: 0, count: 3 }).toArray()
[
  [
    [
      {
        "_id" : "products/7161633459",
        "_rev" : "7161633459",
        "_key" : "7161633459",
        "productId" : 1016,
        "category1" : 1,
        "category2" : 1,
        "attribute1" : 2,
        "attribute2" : 13,
        "name" : "Lore Ipsum 1016"
      },
      {
        "_id" : "products/7169497779",
        "_rev" : "7169497779",
        "_key" : "7169497779",
        "productId" : 1056,
        "category1" : 1,
        "category2" : 6,
        "attribute1" : 3,
        "attribute2" : 2,
        "name" : "Lore Ipsum 1056"
      },
      {
        "_id" : "products/6982720179",
        "_rev" : "6982720179",
        "_key" : "6982720179",
        "productId" : 106,
        "category1" : 1,
        "category2" : 1,
        "attribute1" : 2,
        "attribute2" : 4,
        "name" : "Lore Ipsum 106"
      }
    ],
    [
      {
        "category2" : 0,
        "count" : 44
      },
      {
        "category2" : 1,
        "count" : 44
      },
      {
        "category2" : 2,
        "count" : 44
      },
      {
        "category2" : 3,
        "count" : 44
      },
      {
        "category2" : 4,
        "count" : 44
      },
      {
        "category2" : 5,
        "count" : 44
      },
      {
        "category2" : 6,
        "count" : 44
      }
    ]
  ]
]

In order to make that search string more user friendly, it be necessary to write some small helper functions in Javascript. I think the support group https://groups.google.com/forum/#!forum/arangodb would be they right place to discuss your requirements.

like image 192
fceller Avatar answered Sep 17 '22 19:09

fceller