opinions and advice on database structure

Tags:

I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:

a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e

And I have a list of categories to break these rows up into, for example:

Original   Cat1  Cat2  Cat3  Cat4  Cat5
---------------------------------------
a:b:c:d:e  a     b     c     d     e

As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).

Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.

Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:

Table: Generated
generated_id        int           - unique id for each row generated
generated_timestamp datetime      - timestamp of when row was generated
last_updated        datetime      - timestamp of when row last updated
generated_method    varchar(6)    - method in which row was generated (manual or auto)
original_string     varchar (255) - the original string

Table: Categories
category_id         int           - unique id for category
category_name       varchar(20)   - name of category

Table: Category_Values
category_map_id     int           - unique id for each value (not sure if I actually need this)
category_id         int           - id value to link to table Categories
generated_id        int           - id value to link to table Generated
category_value      varchar (255) - value for the category

Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.

What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.

Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like

select * from Generated where original_string = '$string'
// id is put into $id

and then

select * from Category_Values where generated_id = '$id'

...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?

245

asked May 14 '11 17:05

slinkhi

2 Answers

My suggestion:

Table: Generated
id                  unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated        timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method    ENUM('manual','auto')
original_string     varchar (255)

Table: Categories
id                  unsigned int autoincrement primary key
category_name       varchar(20)   

Table: Category_Values
id                  unsigned int autoincrement primary key
category_id         int           
generated_id        int           
category_value      varchar (255) - value for the category
  FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
  FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id

Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html

146

answered Sep 24 '22 14:09

Johan

I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)

Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).

answered Sep 26 '22 14:09

IAmTimCorey

Related questions
                            
                                Select get entire row corresponding to max in MySQL Group
                            
                                What are the reasons to store documents into DBMS when using Alfresco CMS
                            
                                How do you export tables from phpMyAdmin as plain text table format?
                            
                                Django Test Failing
                            
                                mysql showing null values for group by statements
                            
                                Is database encryption less safe than application encryption?
                            
                                Unix socket connection to MySql with Java to avoid JDBC's TCP/IP overhead?
                            
                                Browse a SQL-Dump file without importing it into a DBMS?
                            
                                MySQL find unused tables
                            
                                Convert MySql to PostgreSQL
                            
                                Shopping Cart Database Structure
                            
                                crowd website simulation on localhost for a php/mysql project
                            
                                Can't connect to mysql because there are no users
                            
                                How do I add a unique index on a field in a partitioned MySQL DB?
                            
                                Can MySqlBulkLoader be used with a transaction?
                            
                                Query for width and height, a record with each greater than the other in the same query?
                            
                                Fastest way to insert, if not exist, then get id in MySQL
                            
                                Will a key in sql still stay a key in a view
                            
                                How to determine if two users share some information without mutliple queries
                            
                                Exporting data to a .sql format. How to escape?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

opinions and advice on database structure

Tags:

sql

mysql

database-design

data-modeling

slinkhi

People also ask

2 Answers

Johan

IAmTimCorey

Recent Activity

Donate For Us