Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matlab database prepared statements and bind variables

I'm rather new to Matlab, but I've had success using it to interact with a database (PostgreSQL in my case) by following the official documentation on how to connect to a database using JDBC drivers, and then executing a query with exec. However, the example given in the latter case involves pasting data into the query, which can be inefficient and open to SQL injection.

What I'd really like to do is use bind variables, such as would be done in a JDBC PreparedStatement. I can't find anything about this in the documentation. Is there some way to do this?

like image 240
beldaz Avatar asked Dec 18 '22 20:12

beldaz


1 Answers

Yes, it is possible to do so, as you pointed, but you may also do all this by means of Matlab Database Toolbox.

Data insertion via Matlab Database Toolbox

In fact, this toolbox works with PostgreSQL via a direct JDBC connection. Moreover, both implemented data insertion methods, datainsert and fastinsert, work via specially created prepared statements. The only difference between these two is in the way the respective prepared statements are filled with data. fastinsert does this in Matlab by using exactly the same technique you mentioned (by using different setters like setDouble, setBoolean, setTimestamp and so on as well as certainly setObject). But it turns out that this way is difficult to use in case of big data volumes because in such cases fastinsert becomes very slow. datainsert fills the created prepared statement with data within Java via a certain object of com.mathworks.toolbox.database.writeTheData class. This class implements two methods―doubleWrite and cellWrite. doubleWrite allows to fill the prepared statement with numerical scalar data. cellWrite assumes that data is passed as a cell matrix with a scalar object in each cell (with the object being either a numeric, a logical Matlab scalar, a Matlab string or a scalar Java object supported by JDBC driver, say, org.postgresql.jdbc.PgArray in the case of an array) for the corresponding field and tuple. Thus, instead of calling setObject from Matlab (this may lead to a significant overhead) it is more efficient to fill the mentioned cell matrix with necessary objects as the first step and then call datainsert from Matlab Database Toolbox as the second step.

Performance of data insertion methods

But in the case all the objects being inserted have Matlab types (scalars, strings, matrices, multi-dimensional arrays, structures and arbitrary other Matlab types), there is at least one more efficient way to insert data using a special high-performance PostgreSQL client library written 100% in C and basing on libpq. It is called PgMex. It is especially useful for big data volumes (about 1Gb and more) or when it is necessary to insert non-scalar data (arrays). In such cases both fastinsert and datainsert demonstrate a degrading performance and a constant lack of Java heap memory caused by limitations of JDBC driver for large data set. This can be easily seen from the following pictures:

The case of scalar numeric data The case of arrays

Here the performance of fastinsert and datainsert is compared with the one of batchParamExec from PgMex (see https://pgmex.alliedtesting.com/#batchparamexec for details). The first picture is for the case of scalar numeric data, the second picture is for arrays. Endpoint of each graph corresponds to a certain maximum data volume passed into the database by the corresponding method without any error. A data volume greater than that maximum (specific for each method) causes “out of Java heap memory” problem (Java heap size for each experiment is specified at the top of each figure). For further details of experiments please see the following paper with full benchmarking results for data insertion.

EDIT: Now PgMex supports free academic licenses.

like image 93
Peter Gagarinov Avatar answered Dec 28 '22 10:12

Peter Gagarinov