Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is kdb fast solely due to processing in memory

Tags:

kdb

I've heard quite a couple times people talking about KDB deal with millions of rows in nearly no time. why is it that fast? is that solely because the data is all organized in memory?

another thing is that is there alternatives for this? any big database vendors provide in memory databases ?

like image 220
zinking Avatar asked Nov 13 '13 09:11

zinking


People also ask

What makes KDB so fast?

The three primary reasons why kdb+ is so fast are: Kdb+ is a vector-oriented database with a built-in programming and query language. The entire kdb+ database and query language have a very small footprint (800 KB) Kdb+ is optimized for data storage.

Is KDB a memory?

kdb+ is a column-based relational time series database (TSDB) with in-memory (IMDB) abilities, developed and marketed by Kx Systems. The database is commonly used in high-frequency trading (HFT) to store, analyze, process, and retrieve large data sets at high speed.


1 Answers

A quick Google search came up with the answer:

Many operations are more efficient with a column-oriented approach. In particular, operations that need to access a sequence of values from a particular column are much faster. If all the values in a column have the same size (which is true, by design, in kdb), things get even better. This type of access pattern is typical of the applications for which q and kdb are used.

To make this concrete, let's examine a column of 64-bit, floating point numbers:

q).Q.w[] `used
108464j
q)t: ([] f: 1000000 ? 1.0)
q).Q.w[] `used
8497328j
q)

As you can see, the memory needed to hold one million 8-byte values is only a little over 8MB. That's because the data are being stored sequentially in an array. To clarify, let's create another table:

q)u: update g: 1000000 ? 5.0 from t
q).Q.w[] `used
16885952j
q)

Both t and u are sharing the column f. If q organized its data in rows, the memory usage would have gone up another 8MB. Another way to confirm this is to take a look at k.h.

Now let's see what happens when we write the table to disk:

q)`:t/ set t
`:t/
q)\ls -l t
"total 15632"
"-rw-r--r-- 1 kdbfaq staff 8000016 May 29 19:57 f"
q)

16 bytes of overhead. Clearly, all of the numbers are being stored sequentially on disk. Efficiency is about avoiding unnecessary work, and here we see that q does exactly what needs to be done when reading and writing a column - no more, no less.

OK, so this approach is space efficient. How does this data layout translate into speed?

If we ask q to sum all 1 million numbers, having the entire list packed tightly together in memory is a tremendous advantage over a row-oriented organization, because we'll encounter fewer misses at every stage of the memory hierarchy. Avoiding cache misses and page faults is essential to getting performance out of your machine.

Moreover, doing math on a long list of numbers that are all together in memory is a problem that modern CPU instruction sets have special features to handle, including instructions to prefetch array elements that will be needed in the near future. Although those features were originally created to improve PC multimedia performance, they turned out to be great for statistics as well. In addition, the same synergy of locality and CPU features enables column-oriented systems to perform linear searches (e.g., in where clauses on unindexed columns) faster than indexed searches (with their attendant branch prediction failures) up to astonishing row counts.

Sources(S): http://www.kdbfaq.com/kdb-faq/tag/why-kdb-fast

like image 116
Hrach Ghapantsyan Avatar answered Sep 20 '22 08:09

Hrach Ghapantsyan