I have a C application that streams data to a kdb memory table all day, eventually outgrowing the size of my server RAM. The goal ultimately is to store data on disk, so I decided to run a timer partition function to transfer data gradually. I came up with this code:
part_timer : { []
(`$db) upsert .Q.en[`$sym_path] select [20000] ts,exch,ticker,side,price,qty,bid,ask from md;
delete from `md where i<20000
}
.z.ts: part_timer
.z.zd: 17 2 6i
\t 1000
Is this the correct way to partition streaming data in real-time? How would you write this code? I'm concerned about the delete statement not being synchronized with the select.
While not an explicit solution to your issue. Take a look at w.q here. This is a write only alternative to the traditional RDB. This buffers up requests and every MAXROWS records writes the data to disk.
In the above comment you asked:
If not, how can I reorganize the db effectively at the end of the day to store symbols sequentially?
I know this answer is a bit delayed, but this might help someone else who is trying to do the same thing.
Run the following to sort the data on disk (This is slower than pulling it into ram, sorting and then writing to disk):
par:.Q.par[PATH;.z.D;TABLE];
par xasc `sym;
@[par;`sym;`p#];
Where:
PATH: `:path/on/disk/to/db/root;
For single file tables:
TABLE: `tableName;
For splayed tables:
TABLE: `$"tablename/"
At end of day (i.e. you don't expect the data to be appended), from your c program you can call:
Write to a location for 2013.01.01
.Q.dpft[`:/path/to/location;2013.01.01;`sym;`tableName];
Clear the table
delete from `tableName
Clear some memory up
.Q.gc peach til system"s"
Of course that assumed you have time/sym columns, and you want to splay by date. Otherwise
`:/path/to/location/tableName/ set tableName
Will splay.
Can append also if you wish (see IO chapter of Q for Mortals for examples)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With