Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative to sqlite OR a better way to handle date / time fields in sqlite

My data tends to be medium to large but never qualifies as "BIG" data. The data is almost always complexly relational. For the purposes I'm talking about here, 10-50 tables with a total size of 1-10 GB. Nothing more. When I deal with data bigger than this, I'll stick it into Postgres or SQL Server.

Overall, I like SQLite, but the data I work with has lots and lots of date / datetime fields and dealing with date fields in SQLite makes my head hurt and when I move data back and forth between R and SQLite, my dates often get mangled.

I am either looking for a file-based alternative to SQLite that is easy to work with from R.

OR

Better techniques/packages for moving data in/out of SQLite and R without mangling the dates. My goal is to stop mangling my dates. For example, when I use dbWriteTable from the RSQLite package my dates are usually messed up in a way that makes them impossible to work with.

My primary workstation is running Ubuntu but I work in an office dominated by Windows. If suggesting an alternative to SQLite, +++ for an alternative that works on both platforms (or more).

like image 852
Choens Avatar asked Nov 04 '22 00:11

Choens


1 Answers

Use epoch times and dates (days from origin, seconds from origin). The conversion using epochs into R POSIXct or Date is fast (strings are very slow).

Edit: Another alternative, after re-reading and considering the size of your data:

You could simply save the tables directly in R format, perhaps with a small piece of extra metadata describing the key relationships between tables. You would have to create your own conventions and all, but it's definitely smoother (no impedance mismatches).

Also, I'm personally very partial to the package data.table. It's fast and has a syntax which is pure R but has a nice mapping onto SQL concepts. E.g. in dt[i, j, by=list(...)], i corresponds to "where", j correspond to "select", and by to "group by" and there are facilities for joins as well, although I wrote infix wrappers around those so it was easier to remember.

I typically do my data processing work exclusively in R (after an initial pull from SQLITE), and I find data.table more faster and practical than massive SQLDF queries.

http://datatable.r-forge.r-project.org/

like image 194
Yike Lu Avatar answered Nov 07 '22 22:11

Yike Lu