Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Azure Cognitive Search be used as a primary database for some data?

Microsoft promotes Azure Search as "cloud search", but doesn't necessarily say it's a "database" or "data storage". It stops short of saying it's big data.

Can/should Azure Search be used as the primary database for some data? Or should there always be some "primary" datastore that is "duplicated" in Azure Search for search purposes?

If so, under what circumstances/what scenarios does it make sense to use Azure Search as a primary database?

like image 649
richard Avatar asked Oct 18 '16 06:10

richard


People also ask

What is Azure Cognitive Search used for?

Azure Cognitive Search (formerly known as "Azure Search") is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.

Is Azure Search a database?

Azure Search consists of 3 main parts: Data Source - where your application gets its data from. It can be an Azure SQL Database, SQL Database, Cosmos DB, Azure Blob Storage container or Azure Table Storage. Whilst creating the data source you'll be asked to select the table in your database you want to be searchable.

Which data format is accepted by Azure Cognitive Search?

Supported document formatsEPUB. GZ. HTML. JSON (see Indexing JSON blobs)

Can you integrate Azure Search in Azure SQL Database?

Yes. However, you need to allow your search service to connect to your database. For more information, see Configure a connection from an Azure Cognitive Search indexer to SQL Server on an Azure VM.


1 Answers

Although we generally don't recommend it, you might consider using Azure Search as a primary store if:

  1. Your app can tolerate some data inconsistency. Azure Search is eventually consistent.
    • When you index data, it is not available for querying immediately.
    • Currently there is no mechanism to control concurrent updates to the same document in an index.
    • When reading data using search queries, paging is not based on any kind of snapshot, so you may get missing or duplicated documents.
  2. You don't need to read out the entire contents of your index. Paging in Azure Search relies on the $skip parameter, which is currently capped at a maximum of 100000. For indexes larger than 100000 documents, it can be very tricky to read all your data out. You'll need to pick some field to partition on, and your reads have no consistency guarantees.
  3. In case of accidental deletion, you are ok with losing your data. Azure Search does not support backup/restore as of the time of this writing. If you accidentally delete your data, you will need to re-index it from its original source.
  4. You won't need to change your index definition much. Modifying or removing fields from your index currently requires re-indexing all your data (you can add new fields without re-indexing). If Azure Search is your primary store, your only option may be to try to read all the data from your old index into a new one, which is subject to all the aforementioned limitations around consistency, $skip, etc.
  5. Your application's query needs match the features that Azure Search provides. Azure Search supports full-text search, facets, and a subset of the OData filter language, but it does not support things like joins between indexes or arbitrary aggregations. If your app needs different query features than what Azure Search provides, you should consider another NoSQL solution like Azure Cosmos DB.
  6. Your application can tolerate high write latency. Since it is a search engine and not a general-purpose DB, Azure Search is optimized heavily for query performance (especially full-text search queries). This comes at the cost of slower write performance, since every write requires a lot of work to index the data. In particular, you will get the best write throughput by batching indexing actions together (batches can contain up to 1000 indexing actions). Writing documents one at a time to the index will result in much lower throughput.

Note that many of these are areas where we want to improve Azure Search in the future for the sake of manageability and ease of use, but it has never been our goal to make Azure Search a general-purpose NoSQL database.

like image 87
Bruce Johnston Avatar answered Sep 23 '22 17:09

Bruce Johnston