I'm rather new to Matlab, but I've had success using it to interact with a database (PostgreSQL in my case) by following the official documentation on how to connect to a database using JDBC drivers, and then executing a query with exec
. However, the example given in the latter case involves pasting data into the query, which can be inefficient and open to SQL injection.
What I'd really like to do is use bind variables, such as would be done in a JDBC PreparedStatement
. I can't find anything about this in the documentation. Is there some way to do this?
Yes, it is possible to do so, as you pointed, but you may also do all this by means of Matlab Database Toolbox.
In fact, this toolbox works with PostgreSQL via a direct JDBC connection. Moreover, both implemented data insertion methods, datainsert
and fastinsert
, work via specially created prepared statements. The only difference between these two is in the way the respective prepared statements are filled with data. fastinsert
does this in Matlab by using exactly the same technique you mentioned (by using different setters like setDouble
, setBoolean
, setTimestamp
and so on as well as
certainly setObject
). But it turns out that this way is difficult to use in case of big data volumes because in such cases fastinsert
becomes very slow.
datainsert
fills the created prepared statement with data within Java via a certain object of
com.mathworks.toolbox.database.writeTheData
class. This class implements two methods―doubleWrite
and cellWrite
.
doubleWrite
allows to fill the prepared statement with numerical scalar data.
cellWrite
assumes that data is passed as a cell matrix with a scalar object in each cell (with the object being either a numeric, a logical Matlab scalar, a Matlab string or a scalar Java object supported by JDBC driver, say, org.postgresql.jdbc.PgArray
in the case of an array) for the
corresponding field and tuple. Thus, instead of calling setObject
from Matlab (this may lead to
a significant overhead) it is more efficient to fill the mentioned cell matrix with necessary objects as the first step
and then call datainsert
from Matlab Database Toolbox as the second step.
But in the case all the objects being inserted have Matlab types
(scalars, strings, matrices, multi-dimensional arrays, structures and arbitrary other Matlab types),
there is at least one more efficient way to insert data using a special high-performance PostgreSQL
client library written 100% in C and basing on libpq. It is called PgMex.
It is especially useful for big data volumes (about 1Gb and more) or when it is
necessary to insert non-scalar data (arrays). In such cases both fastinsert
and datainsert
demonstrate a degrading performance and a constant lack of Java heap memory caused by limitations of JDBC driver for large data set. This can be easily
seen from the following pictures:
Here the performance of fastinsert
and datainsert
is compared with the one of batchParamExec
from PgMex (see https://pgmex.alliedtesting.com/#batchparamexec for details). The first picture is for the case of scalar numeric data, the second picture is for arrays. Endpoint of each graph corresponds
to a certain maximum data volume passed into the database by the corresponding method without any error.
A data volume greater than that maximum (specific for each method) causes “out of Java heap memory” problem
(Java heap size for each experiment is specified at the top of each figure).
For further details of experiments please see the following
paper with full benchmarking results for data insertion.
EDIT: Now PgMex supports free academic licenses.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With