I'm on a project that involves time series analytics, and I need to be able to let users upload a file containing their own time series (ie numbers with dates), for instance in a .csv file. Data contained in their files would then be accessible at any time, to be used within our system.
How could I do that? The ideas I've thought about:
I keep reading it's bad practice to have a varying number of tables (and I believe it). However my situation is slightly different from other questions I've seen on this site (most people seems to want to create one table per user, when they should create one row per user).
Some additional information:
EDIT Reading some answers I realize I may have not done my job very well, I should have said that I'm clearly already evolving in a SQL environment; I already have a User table; when I write "table" I really mean "relation"; all my 4 ideas involve foreign keys somewhere; and RDBMS normalization is the paradigm unless something else is better. (All this not meaning I'm against not-sql solutions).
I'm going to have to go with the "big fat monster table". This is how relational databases are meant to work, although you should normalize it (create one table for users, another for data sets, and another for the data points). Having multiple tables with identical schemas is a bad idea from all angles - design, management, security, even querying; are you sure you'll never want to combine information from two data sets?
If you really are certain that each data set will be totally isolated then you might also consider not using SQL at all. HDF (hierarchical data format) was literally built for this exact purpose, efficient storage and retrieval of "scientific data sets" which are very often time-series data. "Tables" in HDF are literally called data sets, they can share definitions, they can be multidimensional (e.g. one dimension for the day, one for the time), and they are much cheaper than SQL tables.
I don't normally try to steer people away from SQL, but unusual situations sometimes call for unusual solutions. If you're going to end up with billions of rows in a SQL table (or more) and you have practically no other data to store, then SQL may not be the right solution for you.
Your ideas are all fairly good ways of accomplishing the task (hopefully i've read it correctly).
What about a relational database? For example a table with username, time uploaded and a unique dataid, then link the dataid to another table containing the dataid foreign key and the raw file data. This would keep the user table to a minimum (and you could possibly merge it with another table, containing the users details for example). Having a separate table for users and then another for passwords and another for emails and then 5 more for data is probably bad practice, but personally I don't see anything wrong with separating files from user data.
What language are you using to process the data? This could also be a deciding factor also.
Hope this helps :)
Tom
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With