Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chunked Response from an Iterator with Play Framework in Scala

I have a large result set from a database call that I need to stream back to the user as it can't all fit into memory.

I am able to stream the results from the database back by setting the options

val statement = session.conn.prepareStatement(query, 
                java.sql.ResultSet.TYPE_FORWARD_ONLY,
                java.sql.ResultSet.CONCUR_READ_ONLY)
statement.setFetchSize(Integer.MIN_VALUE)
....
....
val res = statement.executeQuery

And then by using an Iterator

val result = new Iterator[MyResultClass] {
    def hasNext = res.next
    def next = MyResultClass(someValue = res.getString("someColumn"), anotherValue = res.getInt("anotherValue"))
}

In Scala, Iterator extends TraversableOnce which should allow me to pass the Iterator to the Enumerator class that is used for the Chunked Response in the play framework according to the documentation at https://www.playframework.com/documentation/2.3.x/ScalaStream

When looking at the source code for Enumerator I discovered that it has an overloaded apply method for consuming a TraversableOnce object

I tried using the following code

import play.api.libs.iteratee.Enumerator
val dataContent = Enumerator(result)
Ok.chunked(dataContent)

But this isn't working as it throws the following exception

Cannot write an instance of Iterator[MyResultClass] to HTTP response. Try to define a Writeable[Iterator[MyResultClass]]

I can't find anywhere in the documentation that talks about what Writable is or does. I thought once the Enumerator consumed the TraversableOnce object, it would take it from there but I guess not??

like image 562
Adam Ritter Avatar asked Mar 05 '15 20:03

Adam Ritter


1 Answers

Problem in your approach

There are two problems with your approach:

  1. You are writing the Iterator to the Enumerator / Iteratee. You should write the content of the Iterator and not the whole Iterator
  2. Scala doesn't know how to express objects of MyResultClass on a HTTP stream. Try to convert them to a String representation (e.g. JSON) before writing them.

Example

build.sbt

A simple Play Scala project with H2 and SQL support.

lazy val root = (project in file(".")).enablePlugins(PlayScala)

scalaVersion := "2.11.6"

libraryDependencies ++= Seq(
  jdbc,
  "org.scalikejdbc" %% "scalikejdbc"       % "2.2.4",
  "com.h2database"  %  "h2"                % "1.4.185",
  "ch.qos.logback"  %  "logback-classic"   % "1.1.2"
)

project/plugins.sbt

Just the minimal config for the sbt play plugin in the current stable version

resolvers += "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"

addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.3.8")

conf/routes

Just one route on /json

GET    /json                        controllers.Application.json

Global.scala

Configuration file, creates and fills the database with demo data during startup of the Play application

import play.api.Application
import play.api.GlobalSettings
import scalikejdbc._

object Global extends GlobalSettings {

  override def onStart(app : Application): Unit = {

    // initialize JDBC driver & connection pool
    Class.forName("org.h2.Driver")
    ConnectionPool.singleton("jdbc:h2:mem:hello", "user", "pass")

    // ad-hoc session provider
    implicit val session = AutoSession


    // Create table
    sql"""
      CREATE TABLE persons (
        customer_id SERIAL NOT NULL PRIMARY KEY,
        first_name VARCHAR(64),
        sure_name VARCHAR(64)
      )""".execute.apply()

    // Fill table with demo data
    Seq(("Alice", "Anderson"), ("Bob", "Builder"), ("Chris", "Christoph")).
      foreach { case (firstName, sureName) =>
        sql"INSERT INTO persons (first_name, sure_name) VALUES (${firstName}, ${sureName})".update.apply()
    }
  }
}

models/Person.scala

Here we define the database schema and the Scala representation of the database objects. Key here is the function personWrites. It converts Person objects to JSON representation (real code is conveniently generated by a macro).

package models

import scalikejdbc._
import scalikejdbc.WrappedResultSet
import play.api.libs.json._

case class Person(customerId : Long, firstName: Option[String], sureName : Option[String])

object PersonsTable extends SQLSyntaxSupport[Person] {
  override val tableName : String = "persons"
  def apply(rs : WrappedResultSet) : Person =
    Person(rs.long("customer_id"), rs.stringOpt("first_name"), rs.stringOpt("sure_name"))
}

package object models {
  implicit val personWrites: Writes[Person] = Json.writes[Person]
}

controllers/Application.scala

Here you have the Iteratee / Enumerator code. First we read the data from the database, then we convert the result to an Iterator and then to an Enumerator. That Enumerator would not be useful, because its content are Person objects and Play doesn't know how to write such objects over HTTP. But with the help of personWrites, we can convert these objects to JSON. And Play knows how to write JSON over HTTP.

package controllers

import play.api.libs.json.JsValue
import play.api.mvc._
import play.api.libs.iteratee._
import scala.concurrent.ExecutionContext.Implicits.global
import scalikejdbc._

import models._
import models.personWrites

object Application extends Controller {

  implicit val session = AutoSession

  val allPersons : Traversable[Person] = sql"SELECT * FROM persons".map(rs => PersonsTable(rs)).traversable().apply()
  def personIterator(): Iterator[Person] = allPersons.toIterator
  def personEnumerator() : Enumerator[Person] = Enumerator.enumerate(personIterator)
  def personJsonEnumerator() : Enumerator[JsValue] = personEnumerator.map(personWrites.writes(_))

  def json = Action {
    Ok.chunked(personJsonEnumerator())
  }
}

Discussion

Database config

The database config is a hack in this example. Usually we would configure Play so it provides a data source and handles all the database stuff in the background.

JSON conversion

In the code I call the JSON conversion directly. There are better approaches, leading to more compact code (but easier to understand for a beginner).

The response you get is not really valid JSON. Example:

{"customerId":1,"firstName":"Alice","sureName":"Anderson"}
{"customerId":2,"firstName":"Bob","sureName":"Builder"}
{"customerId":3,"firstName":"Chris","sureName":"Christoph"}

(Remark: The line break is only for the formatting. On the wire it looks like that:

...son"}{"custom...

Instead you get blocks of valid JSON chunked together. That's what you requested. The receiving end can consume each block on its own. But there is a problem: you must find some way to separate the response into the valid blocks.

The request itself is indeed chunked. Consider the following HTTP headers (in JSON HAR format, exported from Google Chrome):

     "status": 200,
      "statusText": "OK",
      "httpVersion": "HTTP/1.1",
      "headers": [
        {
          "name": "Transfer-Encoding",
          "value": "chunked"
        },
        {
          "name": "Content-Type",
          "value": "application/json; charset=utf-8"
        }

Code organization

I put some SQL code in the controller. In this case this is totally fine. If the code becomes bigger, it might be better to the SQL stuff in the model and let the controller use a more general (in this case: "monadic plus", i.e. map, filter, flatMap) interface.

In the controller JSON code and SQL code are mixed together. When the code gets bigger, you should organize it, e.g. per technology or per model object / business domain.

Blocking iterator

The usage of an iterator leads to blocking behavior. This is usually a big problem, but should be avoided for applications the must should a lot of load (hundreds or thousands of hits per second) or that must answer really fast (think of trading algorithms working live on the stack exchange). In this case you could use a NoSQL database as a cache (please don't use it as the only data store) or non-blocking JDBC (e.g. async postgres / mysql). Again: this is not necessary for big applications.

Attention: As soon as you convert to an iterator, remember that you can consume an iterator only once. For each request, you need a fresh iterator.

Conclusion

A complete WebApp including database access completely in a (not so short) SO answer. I really like the Play framework.

This code is for educational purposes. It is extra awkward in some places, to make it easier to understand the concepts for a beginner. In a real application, you would straighten these things out, because you already know the concepts and you just want to see the purpose of the code (why is it there? which tools is it using? when is it doing what?) on the first glance.

Have fun!

like image 57
stefan.schwetschke Avatar answered Sep 25 '22 23:09

stefan.schwetschke