Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to store user-submitted item names (and their synonyms)

Consider an e-commerce application with multiple stores. Each store owner can edit the item catalog of his store.

My current database schema is as follows:

item_names: id | name | description | picture | common(BOOL)
items: id | item_name_id | picture | price | description | picture
item_synonyms: id | item_name_id | name | error(BOOL)

Notes: error indicates a wrong spelling (eg. "Ericson"). description and picture of the item_names table are "globals" that can optionally be overridden by "local" description and picture fields of the items table (in case the store owner wants to supply a different picture for an item). common helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza")

I think the bright side of this schema is:

Optimized searching & Handling Synonyms: I can query the item_names & item_synonyms tables using name LIKE %QUERY% and obtain the list of item_name_ids that need to be joined with the items table. (Examples of synonyms: "Sony Ericsson", "Sony Ericson", "X10", "X 10")

Autocompletion: Again, a simple query to the item_names table. I can avoid the usage of DISTINCT and it minimizes number of variations ("Sony Ericsson Xperia™ X10", "Sony Ericsson - Xperia X10", "Xperia X10, Sony Ericsson")

The down side would be:

Overhead: When inserting an item, I query item_names to see if this name already exists. If not, I create a new entry. When deleting an item, I count the number of entries with the same name. If this is the only item with that name, I delete the entry from the item_names table (just to keep things clean; accounts for possible erroneous submissions). And updating is the combination of both.

Weird Item Names: Store owners sometimes use sentences like "Harry Potter 1, 2 Books + CDs + Magic Hat". There's something off about having so much overhead to accommodate cases like this. This would perhaps be the prime reason I'm tempted to go for a schema like this:

items: id | name | picture | price | description | picture

(... with item_names and item_synonyms as utility tables that I could query)

  • Is there a better schema you would suggested?
  • Should item names be normalized for autocomplete? Is this probably what Facebook does for "School", "City" entries?
  • Is the first schema or the second better/optimal for search?

Thanks in advance!

References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT


EDIT: In the event of 2 items being entered with similar names, an Admin who sees this simply clicks "Make Synonym" which will convert one of the names into the synonym of the other. I don't require a way to automatically detect if an entered name is the synonym of the other. I'm hoping the autocomplete will take care of 95% of such cases. As the table set increases in size, the need to "Make Synonym" will decrease. Hope that clears the confusion.


UPDATE: To those who would like to know what I went ahead with... I've gone with the second schema but removed the item_names and item_synonyms tables in hopes that Solr will provide me with the ability to perform all the remaining tasks I need:

items: id | name | picture | price | description | picture

Thanks everyone for the help!

like image 955
RabidFire Avatar asked Jan 04 '11 06:01

RabidFire


People also ask

What are synonyms in SQL?

A synonym is a database object that serves the following purposes: Provides an alternative name for another database object, referred to as the base object, that can exist on a local or remote server.

What are two reasons to create synonyms?

You want to work on your own tables. E. You want to use another schema's tables.

What is the use of synonym in Oracle?

A synonym may have the same name as the underlying object, provided the underlying object is contained in another schema. Oracle Database Resolution of Synonyms: Example Oracle Database attempts to resolve references to objects at the schema level before resolving them at the PUBLIC synonym level.


1 Answers

The requirements you state in your comment ("Optimized searching", "Handling Synonyms" and "Autocomplete") are not things that are generally associated with an RDBMS. It sounds like what you're trying to solve is a searching problem, not a data storage and normalization problem. You might want to start looking at some search architectures like Solr

Excerpted from the solr feature list:

Faceted Searching based on unique field values, explicit queries, or date ranges

Spelling suggestions for user queries

More Like This suggestions for given document

Auto-suggest functionality

Performance Optimizations

like image 132
Mark Tozzi Avatar answered Sep 29 '22 01:09

Mark Tozzi