Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design - looking for a radical change

I have a relational database that stores plant information. There's a plant_id, a plant category which links to a category table, etc.etc. The design is quite good but I'm confronted with a task that is quite interesting. The data must be populated automatically by the users.

The ideal scenario is a user that logs into a website and uploads their set of data, usually in a plain-text format (such as .csv or another Windows document), and this data automatically gets inserted into my plants database.

So I basically give out specification; the user should set their .csv columns to a specific name so that the script can link with my database fields. There's many of these fields that refer to generic attributes of a plant... for example a plant height, or the plant color. This information might be a height using the metric system, or using another system. The color might be something like brownish that is actually the same as saying brown. There's hundreds of these examples.

I'm trying to automatically fix these inconcistencies so that a brownish attribute will link to the actual brown color. This link is important because people using the site will want to locate plants of a given color, and right now I have to go through and manually associate all these different attributes.

How would you fix this and avoid manual intervention?

like image 611
Luca Matteis Avatar asked Feb 19 '11 11:02

Luca Matteis


People also ask

What is an example of radical change?

A real-world example of radical change may be completely transforming how a company structures itself. Perhaps the company will move away from a structure in which there is top-down control of the company to one in which there are a lot of smaller, self-directing teams instead.

What is radical change?

Radical change refers to change that occurs relatively fast and modifies the essence of social structures or organizational practices. Specifically, this type of change affects the resources, norms, and interpretive schemes of groups and individuals. Learn more in: Structure Theory and Government IT.

What is API design?

API design is the collection of planning and architectural decisions you make when building an API. Your basic API design influences how well developers are able to consume it and even how they use it. Just like website design or product design, API design informs the user experience.


2 Answers

Oh! such an interesting bit of work.

You need to actually write a grammar and a parser that can update the data according to the grammar. What you can do is do an initial study of a finite set of data and create some grammar and try uploading the data. If anything fails, log it with details for the grammar, expected value and actual value, for example say something like this "plant color mismatch, expected 'brown (or BROWN, brownish) and found 'borwn'; please update the input data or add 'borwn' to the grammar". This will hint you what to do in the future. Also, you can plan to automatically add such things into the grammar looking at some patterns. But initially, it can be manual. So you shall have to see the log from time to time and update the grammar.

Later, if you feel comfortable, you can expose some interface to the user to update the grammar.

I am talking very positively, but I know there will be lot of challenges. Please get back with your challenges.

like image 139
Kangkan Avatar answered Oct 02 '22 15:10

Kangkan


If you want to accept all these values for plant heights--3mm, 3cm, 3in, 3", 3', 3--then what you want to do is impossible. There's no automated way to tell whether a bare '3' means '3cm' or '3in'.

There are a few different ways to handle this kind of problem. But the approach you take depends in part on who's interested in the data.

If the data your users are uploading is mainly of use to the users themselves, they'll tolerate doing more work for you. If the data is mainly of use to others, they won't.

You can choose to be stingy or liberal in what you accept. The more stingy you are, the more automated your processes can be.

Maybe you can be really stingy by providing your users an Excel spreadsheet with validation.

If you have to be really liberal in what you accept, you'll probably need several tables to map one standard value to many nonstandard values. A table for mapping colors might look like this:

Standard  User-supplied
--
brown     brown
brown     brownish
brown     brn
brown     puce

You'd have to populate that table manually at first. But the more data you process, the more likely a user-supplied color will already exist in the table.

That table doesn't have to exist in a database. It could be external, and your data cleanup process could be external. The "T" part of ETL for data warehousing is sometimes done this way. ("ETL" stands for "Extract, transform, and load".)

like image 40
Mike Sherrill 'Cat Recall' Avatar answered Oct 02 '22 15:10

Mike Sherrill 'Cat Recall'