Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: how would you save users' own data?

I'm on a project that involves time series analytics, and I need to be able to let users upload a file containing their own time series (ie numbers with dates), for instance in a .csv file. Data contained in their files would then be accessible at any time, to be used within our system.

How could I do that? The ideas I've thought about:

  1. Create a table each time a user upload a file (and save somewhere the name of that table). If I have lots of users uploading lots of data, I may end up with tons of tables.
  2. Create one big fat monster table with basically three or four columns: the date of the value; the value; the dataset name (and/or the dataset's owner). Everything is uploaded in that table, and when Bob needs its weather data I just select (date,value) where owner = Bob and datasetname = weatherdata.
  3. In between solution: one table per user, and all Bob's datasets are in Bob's table.
  4. Completely different: just save the .csv file somewhere and read it when you need it.

I keep reading it's bad practice to have a varying number of tables (and I believe it). However my situation is slightly different from other questions I've seen on this site (most people seems to want to create one table per user, when they should create one row per user).

Some additional information:

  • time series data may contain hundreds of thousands observations, maybe millions
  • a priori, saved data should not be modified afterwards. However I guess it would be useful to let users append new data to their time series.
  • a priori, I won't need to do complicated SQL select statements. I just want to read Bob's weather data and I'll probably use it in the chronological order - although you never know what tomorrow may bring.
  • using PostgreSQL 9.1, if that's of any importance.

EDIT Reading some answers I realize I may have not done my job very well, I should have said that I'm clearly already evolving in a SQL environment; I already have a User table; when I write "table" I really mean "relation"; all my 4 ideas involve foreign keys somewhere; and RDBMS normalization is the paradigm unless something else is better. (All this not meaning I'm against not-sql solutions).

like image 918
Arthur Avatar asked Jan 19 '23 09:01

Arthur


2 Answers

I'm going to have to go with the "big fat monster table". This is how relational databases are meant to work, although you should normalize it (create one table for users, another for data sets, and another for the data points). Having multiple tables with identical schemas is a bad idea from all angles - design, management, security, even querying; are you sure you'll never want to combine information from two data sets?

If you really are certain that each data set will be totally isolated then you might also consider not using SQL at all. HDF (hierarchical data format) was literally built for this exact purpose, efficient storage and retrieval of "scientific data sets" which are very often time-series data. "Tables" in HDF are literally called data sets, they can share definitions, they can be multidimensional (e.g. one dimension for the day, one for the time), and they are much cheaper than SQL tables.

I don't normally try to steer people away from SQL, but unusual situations sometimes call for unusual solutions. If you're going to end up with billions of rows in a SQL table (or more) and you have practically no other data to store, then SQL may not be the right solution for you.

like image 62
Aaronaught Avatar answered Jan 26 '23 03:01

Aaronaught


Your ideas are all fairly good ways of accomplishing the task (hopefully i've read it correctly).

What about a relational database? For example a table with username, time uploaded and a unique dataid, then link the dataid to another table containing the dataid foreign key and the raw file data. This would keep the user table to a minimum (and you could possibly merge it with another table, containing the users details for example). Having a separate table for users and then another for passwords and another for emails and then 5 more for data is probably bad practice, but personally I don't see anything wrong with separating files from user data.

What language are you using to process the data? This could also be a deciding factor also.

Hope this helps :)

Tom

like image 21
Xleedos Avatar answered Jan 26 '23 02:01

Xleedos