I've started working with RSQLite and dplyr to efficiently process large datasets. However, I haven't been able to reconcile how to get RSQLite to format dates or what best practices are here. The example below should illustrate where the process falls apart for me:
library(tidyverse)
library(RSQLite)
Date is formatted appropriately
date=seq(as.Date("1910/1/1"), as.Date("1911/1/1"), "days")
x=rnorm(length(date))
df1 <- tibble(date, x)
df1
# A tibble: 366 × 2
date x
<date> <dbl>
1 1910-01-01 1.72459562
2 1910-01-02 0.88216253
3 1910-01-03 -0.35434587
4 1910-01-04 -0.63401467
5 1910-01-05 0.18136909
6 1910-01-06 -0.09513488
7 1910-01-07 -1.03252313
8 1910-01-08 0.40924962
9 1910-01-09 0.90759866
10 1910-01-10 0.60456596
# ... with 356 more rows
dbname = "test.sqlite3"
con <- dbConnect(SQLite(), dbname)
dbWriteTable(con, "test", df1, append=TRUE)
dbListTables(con)
dbListFields(con, "test")
test_db <- src_sqlite(path=dbname)
We lose the formatting which is problematic for subsequent processing.
tbl(test_db, "test")
Source: query [?? x 2]
Database: sqlite 3.11.1 [test.sqlite3]
date x
<dbl> <dbl>
1 -21915 -0.05640646
2 -21914 -0.05640646
3 -21913 -0.05640646
4 -21912 -0.05640646
5 -21911 -0.05640646
6 -21910 -0.05640646
7 -21909 -0.05640646
8 -21908 -0.05640646
9 -21907 -0.05640646
10 -21906 -0.05640646
Can anyone recommend strategies for dealing with dates when using RSQLite?
The date() function returns the date as text in this format: YYYY-MM-DD. The time() function returns the time as text in this format: HH:MM:SS. The datetime() function returns the date and time as text in their same formats: YYYY-MM-DD HH:MM:SS.
SQLite does not support built-in DateTime storage a class, but SQLite allows us to work with timestamp types.
Use the STRFTIME() function to format date\time\datetime data in SQLite. This function takes two arguments. The first argument is a format string containing the date/time part pattern. In our example, we use the format string '%d/%m/%Y, %H:%M'.
Using INTEGER to store SQLite date and time values First, create a table that has one column whose data type is INTEGER to store the date and time values. Second, insert the current date and time value into the datetime_int table. Third, query data from the datetime_int table.
There's no date or time datatype in SQLite. Two possible approaches are to save them as text or as an integer. I suspect they're getting stored as days before the unix epoch (1970-01-01) by default.
If you want to work with the dates as integers, you can select by date:
d <- as.numeric(as.Date("1910-01-04"))
filter(test, date < d)
Or convert back to a tbl with dates as dates:
collect(test) %>% mutate(date = as.Date(date, '1970-01-01'))
You could store the dates as text:
df1$ts <- as.character(df1$date)
dbWriteTable(con, "test", df1, overwrite=TRUE)
tbl(test_db, "test")
This is subsettable in the way you'd hope:
`filter(test, date < "1910-01-04")`
You could also convert to a normal tbl and convert the text to dates:
collect(test) %>% mutate(date = as.Date(date))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With