Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Psych Experiment needs (simple) database: please advise

I am coding a psychology experiment in Python. I need to store user information and scores somewhere, and I need it to work as a web application (and be secure).

Don't know much about this - I'm considering XML databases, BerkleyDB, sqlite, an openoffice spreadsheet, or I'm very interested in the python "shelve" library. (most of my info coming from this thread: http://developers.slashdot.org/story/08/05/20/2150246/FOSS-Flat-File-Database

DATA: I figure that I'm going to have maximally 1000 users. For each user I've got to store...

  • Username / Pass
  • User detail fields (for a simple profile)
  • User scores on the exercise (2 datapoints: each trial gets a score (correct/incorrect/timeout, and has an associated number from 0.1 to 1.0 that I need to record)
  • Metadata about the trials (when, who, etc.)
  • Results of data analysis for user

VERY rough estimate, each user generates 100 trials / day. So maximum of 10k datapoints / day. It needs to run that way for about 3 months, so about 1m datapoints. Safety multiplier 2x gives me a target of a database that can handle 2m datapoints.

((note: I could either store trial response data as individual data points, or group trials into Python list objects of varying length (user "sessions"). The latter would dramatically bring down the number database entries, though not the amount of data. Does it matter? How?))

I want a solution that will work (at least) until I get to this 1000 users level. If my program is popular beyond that level, I'm alright with doing some work modding in a beefier DB. Also reiterating that it must be easily deployable as a web application.

Beyond those basic requirements, I just want the easiest thing that will make this work. I'm pretty green.

Thanks for reading

Tr3y

like image 970
Tr3y Avatar asked Jul 16 '11 16:07

Tr3y


2 Answers

SQLite can certainly handle those amount of data, it has a very large userbase with a few very well known users on all the major platforms, it's fast, light, and there are awesome GUI clients that allows you to browse and extract/filter data with a few clicks.

SQLite won't scale indefinitely, of course, but severe performance problems begins only when simultaneous inserts are needed, which I would guess is a problem appearing several orders of magnitude after your prospected load.

I'm using it since a few years now, and I never had a problem with it (although for larger sites I use MySQL). Personally I find that "Small. Fast. Reliable. Choose any three." (which is the tagline on SQLite's site) is quite accurate.

As for the ease of use... SQLite3 bindings (site temporarily down) are part of the python standard library. Here you can find a small tutorial. Interestingly enough, simplicity is a design criterion for SQLite. From here:

Many people like SQLite because it is small and fast. But those qualities are just happy accidents. Users also find that SQLite is very reliable. Reliability is a consequence of simplicity. With less complication, there is less to go wrong. So, yes, SQLite is small, fast, and reliable, but first and foremost, SQLite strives to be simple.

like image 183
mac Avatar answered Nov 20 '22 04:11

mac


There's a pretty spot-on discussion of when to use SQLite here. My favorite line is this:

Another way to look at SQLite is this: SQLite is not designed to replace Oracle. It is designed to replace fopen().

It seems to me that for your needs, SQLite is perfect. Indeed, it seems to me very possible that you will never need anything else:

With the default page size of 1024 bytes, an SQLite database is limited in size to 2 terabytes (2^41 bytes).

It doesn't sound like you'll have that much data at any point.

like image 6
senderle Avatar answered Nov 20 '22 05:11

senderle