Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

process csv in scala

Tags:

java

sqlite

scala

I am using scala 2.7.7, and wanted to parse CSV file and store the data in SQLite database.

I ended up using OpenCSV java library to parse the CSV file, and using sqlitejdbc library.

Using these java libraries makes my scala code looks almost identical to that of Java code (sans semicolon and with val/var)

As I am dealing with java objects, I can't use scala list, map, etc, unless I do scala2java conversion or upgrade to scala 2.8

Is there a way I can simplify my code further using scala bits that I don't know?

val filename = "file.csv";
val reader = new CSVReader(new FileReader(filename))
var aLine = new Array[String](10)
var lastSymbol = ""
while( (aLine = reader.readNext()) != null ) {
    if( aLine != null ) {
        val symbol = aLine(0)
        if( !symbol.equals(lastSymbol)) { 
            try {
                val rs = stat.executeQuery("select name from sqlite_master where name='" + symbol + "';" )
                if( !rs.next() ) {
                    stat.executeUpdate("drop table if exists '" + symbol + "';")
                    stat.executeUpdate("create table '" + symbol + "' (symbol,data,open,high,low,close,vol);")
                }
            }
            catch {
              case sqle : java.sql.SQLException =>
                 println(sqle)

            }
            lastSymbol = symbol
        }
        val prep = conn.prepareStatement("insert into '" + symbol + "' values (?,?,?,?,?,?,?);")
        prep.setString(1, aLine(0)) //symbol
        prep.setString(2, aLine(1)) //date
        prep.setString(3, aLine(2)) //open
        prep.setString(4, aLine(3)) //high
        prep.setString(5, aLine(4)) //low
        prep.setString(6, aLine(5)) //close
        prep.setString(7, aLine(6)) //vol
        prep.addBatch()
        prep.executeBatch()
     }
}
conn.close()
like image 389
Lydon Ch Avatar asked Dec 02 '22 05:12

Lydon Ch


1 Answers

If you have a simple CSV file, an alternative would be not to use any CSV library at all, but just simply parse it in Scala, for example:


case class Stock(line: String) {
  val data = line.split(",")
  val date = data(0)
  val open = data(1).toDouble
  val high = data(2).toDouble
  val low = data(3).toDouble
  val close = data(4).toDouble
  val volume = data(5).toDouble
  val adjClose = data(6).toDouble

  def price: Double = low
}

scala> import scala.io._

scala> Source.fromFile("stock.csv") getLines() map (l => Stock(l))
res0: Iterator[Stock] = non-empty iterator


scala> res0.toSeq  
res1: Seq[Stock] = List(Stock(2010-03-15,37.90,38.04,37.42,37.64,941500,37.64), Stock(2010-03-12,38.00,38.08,37.66,37.89,834800,37.89) //etc...

Which would have the advantage that you can use the full Scala collection API.

If you prefer to use parser combinators, there's also an example of a csv parser combinator on github.

like image 166
Arjan Blokzijl Avatar answered Dec 06 '22 11:12

Arjan Blokzijl