Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Batch insert with table that has many columns using Anorm

Tags:

mysql

scala

anorm

I'm trying to do a batch insert into a MySQL database table using Anorm (in the play framework 2.3.1). The application I'm building has a standard web front end in addition to requiring batch data inserts and I would like to try and keep the logic on the same software stack.

The inserts only go into the same few tables.

The number of rows to be insert at once will reach hundreds and may get into thousands, I expect I may need to limit the number of inserted rows at some point due to anorm / mysql / other limits.

The MySQL driver I'm using is mysql-connector-java - 5.1.31

Below is a cut down use case.

Using table:

CREATE TABLE table1
(
  col1    INTEGER   NOT NULL,
  col2    BIGINT,
  col3    VARCHAR(255)
); 

And scala code:

import play.api.Play.current
import play.api.db.DB
import anorm._ 

object TestInserts {

  DB.withConnection("spo") { implicit conn => 

    val theInserts = Seq(
       Seq[NamedParameter]('val1 -> 1, 'val2 -> Some(1L), 'val3 -> Some("One"))
      ,Seq[NamedParameter]('val1 -> 2, 'val2 -> Some(2L), 'val3 -> Some("Two"))
      ,Seq[NamedParameter]('val1 -> 3, 'val2 -> Some(3L), 'val3 -> Some("Three"))
    )

    val insertBatchSQL = BatchSql( SQL("insert into table1 (col1, col2, col3) values ({val1}, {val2}, {val3})"), theInserts)  

    insertBatchSQL.execute

  } 

}

I am getting the following error

java.sql.SQLException: Parameter index out of range (1 > number of parameters, which is 0).
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1094)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:997)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:983)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:928)
at com.mysql.jdbc.PreparedStatement.checkBounds(PreparedStatement.java:3688)
at com.mysql.jdbc.PreparedStatement.setInternal(PreparedStatement.java:3670)
at com.mysql.jdbc.PreparedStatement.setInternal(PreparedStatement.java:3715)
at com.mysql.jdbc.PreparedStatement.setInt(PreparedStatement.java:3659)
at com.jolbox.bonecp.PreparedStatementHandle.setInt(PreparedStatementHandle.java:828)
at anorm.ToStatement$intToStatement$.set(ToStatement.scala:164)
at anorm.ToStatement$intToStatement$.set(ToStatement.scala:163)
...

I looked at the test classes in the play framework that test the batch insert https://github.com/playframework/playframework/blob/master/framework/src/anorm/src/test/scala/anorm/BatchSqlSpec.scala and as far as I can tell it should work the same.

Any pointers on how to fix this or if I should be tackling this in a different way would be great.

like image 991
Mark Sivill Avatar asked Jul 04 '14 11:07

Mark Sivill


2 Answers

I'll go with option B. I'm not very familiar with BatchSql since last I checked it just executes a boat load of queries in sequence, which is awfully slow. I'd recommend aggregating everything into a single query. It's a little more tedious, but much faster to execute a single query with one thousand inserts than one thousand single inserts.

For convenience, lets say you have Seq of

case class Test(val1: Int, val2: Option[Long], val3: Option[String])

Then you could build your query like this:

val values: Seq[Test] = Seq(....)

/* Index your sequence for later, to map to inserts and parameters alike */
val indexedValues = values.zipWithIndex

/* Create the portion of the insert statement with placeholders, each with a unique index */
val rows = indexValues.map{ case (value, i) =>
    s"({val1_${i}}, {val2_${i}}, {val3_${i}})"
}.mkString(",")

/* Create the NamedParameters for each `value` in the sequence, each with their unique index in the token, and flatten them together */
val parameters = indexedValues.flatMap{ case(value, i) =>
    Seq(
        NamedParameter(s"val1_${i}" -> value.val1),
        NamedParameter(s"val2_${i}" -> value.val2),
        NamedParameter(s"val3_${i}" -> value.val3)
    ) 
}

/* Execute the insert statement, applying the aggregated parameters */
SQL("INSERT INTO table1 (col1, col2, col3) VALUES " + rows)
    .on(parameters: _ *)
    .executeInsert()

Notes:

You will have to check that values is non-empty before proceeding, as it would generate an invalid SQL statement if it was.

Depending on how many rows and columns you're inserting, eventually the token parsers that created the prepared statement will slow down from the sheer amount of tokens to parse (and the string size). I've noticed this after a few hundred rows with several columns. This can be mitigated somewhat. Thanks to Scala being a strongly typed language, Int and Long pose no threat for SQL injection. You could prepare your SQL statements using string interpolation/concatenation for just those columns and bind the unsafe columns with NamedParameter normally. That would cut down on the number of tokens that need to be parsed.

like image 138
Michael Zajac Avatar answered Sep 30 '22 18:09

Michael Zajac


Some issues on BatchSql have been fixed 12d ago (backported in 2.3.1): https://github.com/playframework/playframework/pull/3087. It should work with it.

like image 24
cchantep Avatar answered Sep 30 '22 17:09

cchantep