Consider an e-commerce application with multiple stores. Each store owner can edit the item catalog of his store. My current database schema is as follows: <pre class="prettyprint"><code>item_names: id | name | description | picture | common(BOOL) items: id | item_name_id | picture | price | description | picture item_synonyms: id | item_name_id | name | error(BOOL) </code></pre> Notes: <code>error</code> indicates a wrong spelling (eg. "Ericson"). <code>description</code> and <code>picture</code> of the <code>item_names</code> table are "globals" that can optionally be overridden by "local" <code>description</code> and <code>picture</code> fields of the <code>items</code> table (in case the store owner wants to supply a different picture for an item). <code>common</code> helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza") I think the bright side of this schema is: Optimized searching & Handling Synonyms: I can query the <code>item_names</code> & <code>item_synonyms</code> tables using <code>name LIKE %QUERY%</code> and obtain the list of <code>item_name_id</code>s that need to be joined with the <code>items</code> table. (Examples of synonyms: "Sony Ericsson", "Sony Ericson", "X10", "X 10") Autocompletion: Again, a simple query to the <code>item_names</code> table. I can avoid the usage of <code>DISTINCT</code> and it minimizes number of variations ("Sony Ericsson Xperia™ X10", "Sony Ericsson - Xperia X10", "Xperia X10, Sony Ericsson") The down side would be: Overhead: When inserting an item, I query <code>item_names</code> to see if this name already exists. If not, I create a new entry. When deleting an item, I count the number of entries with the same name. If this is the only item with that name, I delete the entry from the <code>item_names</code> table (just to keep things clean; accounts for possible erroneous submissions). And updating is the combination of both. Weird Item Names: Store owners sometimes use sentences like "Harry Potter 1, 2 Books + CDs + Magic Hat". There's something off about having so much overhead to accommodate cases like this. This would perhaps be the prime reason I'm tempted to go for a schema like this: <pre class="prettyprint"><code>items: id | name | picture | price | description | picture </code></pre> (... with <code>item_names</code> and <code>item_synonyms</code> as utility tables that I could query) <ul> <li>Is there a better schema you would suggested?</li> <li>Should item names be normalized for autocomplete? Is this probably what Facebook does for "School", "City" entries?</li> <li>Is the first schema or the second better/optimal for search?</li> </ul> Thanks in advance! References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT <hr> EDIT: In the event of 2 items being entered with similar names, an Admin who sees this simply clicks "Make Synonym" which will convert one of the names into the synonym of the other. I don't require a way to automatically detect if an entered name is the synonym of the other. I'm hoping the autocomplete will take care of 95% of such cases. As the table set increases in size, the need to "Make Synonym" will decrease. Hope that clears the confusion. <hr> UPDATE: To those who would like to know what I went ahead with... I've gone with the second schema but removed the <code>item_names</code> and <code>item_synonyms</code> tables in hopes that Solr will provide me with the ability to perform all the remaining tasks I need: <pre class="prettyprint"><code>items: id | name | picture | price | description | picture </code></pre> Thanks everyone for the help!

The requirements you state in your comment ("Optimized searching", "Handling Synonyms" and "Autocomplete") are not things that are generally associated with an RDBMS. It sounds like what you're trying to solve is a searching problem, not a data storage and normalization problem. You might want to start looking at some search architectures like Solr Excerpted from the solr feature list: <blockquote> Faceted Searching based on unique field values, explicit queries, or date ranges Spelling suggestions for user queries More Like This suggestions for given document Auto-suggest functionality Performance Optimizations </blockquote>

Best way to store user-submitted item names (and their synonyms)

Tags:

database

database-design

denormalization

normalization

Consider an e-commerce application with multiple stores. Each store owner can edit the item catalog of his store.

My current database schema is as follows:

item_names: id | name | description | picture | common(BOOL)
items: id | item_name_id | picture | price | description | picture
item_synonyms: id | item_name_id | name | error(BOOL)

Notes: error indicates a wrong spelling (eg. "Ericson"). description and picture of the item_names table are "globals" that can optionally be overridden by "local" description and picture fields of the items table (in case the store owner wants to supply a different picture for an item). common helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza")

I think the bright side of this schema is:

Optimized searching & Handling Synonyms: I can query the item_names & item_synonyms tables using name LIKE %QUERY% and obtain the list of item_name_ids that need to be joined with the items table. (Examples of synonyms: "Sony Ericsson", "Sony Ericson", "X10", "X 10")

Autocompletion: Again, a simple query to the item_names table. I can avoid the usage of DISTINCT and it minimizes number of variations ("Sony Ericsson Xperia™ X10", "Sony Ericsson - Xperia X10", "Xperia X10, Sony Ericsson")

The down side would be:

Overhead: When inserting an item, I query item_names to see if this name already exists. If not, I create a new entry. When deleting an item, I count the number of entries with the same name. If this is the only item with that name, I delete the entry from the item_names table (just to keep things clean; accounts for possible erroneous submissions). And updating is the combination of both.

Weird Item Names: Store owners sometimes use sentences like "Harry Potter 1, 2 Books + CDs + Magic Hat". There's something off about having so much overhead to accommodate cases like this. This would perhaps be the prime reason I'm tempted to go for a schema like this:

items: id | name | picture | price | description | picture

(... with item_names and item_synonyms as utility tables that I could query)

Is there a better schema you would suggested?
Should item names be normalized for autocomplete? Is this probably what Facebook does for "School", "City" entries?
Is the first schema or the second better/optimal for search?

Thanks in advance!

References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT

EDIT: In the event of 2 items being entered with similar names, an Admin who sees this simply clicks "Make Synonym" which will convert one of the names into the synonym of the other. I don't require a way to automatically detect if an entered name is the synonym of the other. I'm hoping the autocomplete will take care of 95% of such cases. As the table set increases in size, the need to "Make Synonym" will decrease. Hope that clears the confusion.

UPDATE: To those who would like to know what I went ahead with... I've gone with the second schema but removed the item_names and item_synonyms tables in hopes that Solr will provide me with the ability to perform all the remaining tasks I need:

items: id | name | picture | price | description | picture

Thanks everyone for the help!

955

asked Jan 04 '11 06:01

RabidFire

1 Answers

The requirements you state in your comment ("Optimized searching", "Handling Synonyms" and "Autocomplete") are not things that are generally associated with an RDBMS. It sounds like what you're trying to solve is a searching problem, not a data storage and normalization problem. You might want to start looking at some search architectures like Solr

Excerpted from the solr feature list:

Faceted Searching based on unique field values, explicit queries, or date ranges

Spelling suggestions for user queries

More Like This suggestions for given document

Auto-suggest functionality

Performance Optimizations

132

answered Sep 29 '22 01:09

Mark Tozzi

Related questions
                            
                                AWS RDS MariaDB consumed 300GB on simple alter table for no reason
                            
                                ActiveRecord OR operator slows down query by factor of 10. Why?
                            
                                How to create dynamic and safe queries
                            
                                Some data changes in the database. How can I trigger some C# code doing some work upon these changes?
                            
                                Database change management tools? [closed]
                            
                                Core Data syncing
                            
                                How do I decide the number of connections required in connection pooling?
                            
                                ETL framework for loading data into Rails app
                            
                                PHP trouble with concurrent sessions and AJAX
                            
                                Matrices and databases
                            
                                How to separate programming logic and data in MS SQL Server 2005?
                            
                                Rspec > testing database views
                            
                                What are your best practices for ensuring the correctness of the reports from SQL?
                            
                                Biztalk suspended messages in database
                            
                                Single HTML Table from multiple MySQL tables
                            
                                WebApp Password Management - Hashing, Salting, etc
                            
                                Is there anyway to implement Full Text Search (FTS) in SQlite from Android platform?
                            
                                How to create production database sample for testing?
                            
                                Is there any interests database for download?
                            
                                What is considered a "best practice" for the design of a set of PHP scripts which service AJAX requests?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With