Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to read immutable data structures from file in scala

I have a data structure made of Jobs each containing a set of Tasks. Both Job and Task data are defined in files like these:

jobs.txt:
JA
JB
JC

tasks.txt:
JB  T2
JA  T1
JC  T1
JA  T3
JA  T2
JB  T1 

The process of creating objects is the following:
- read each job, create it and store it by id
- read task, retrieve job by id, create task, store task in the job

Once the files are read this data structure is never modified. So I would like that tasks within jobs would be stored in an immutable set. But I don't know how to do it in an efficient way. (Note: the immutable map storing jobs may be left immutable)

Here is a simplified version of the code:

class Task(val id: String) 

class Job(val id: String) {
    val tasks = collection.mutable.Set[Task]() // This sholud be immutable
}

val jobs = collection.mutable.Map[String, Job]() // This is ok to be mutable

// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) { 
    val job = new Job(line.trim)
    jobs += (job.id -> job)
}

// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
    val tokens = line.split("\t")
    val job = jobs(tokens(0).trim)
    val task = new Task(job.id + "." + tokens(1).trim)
    job.tasks += task
}

Thanks in advance for every suggestion!

like image 720
Filippo Tabusso Avatar asked Feb 08 '10 16:02

Filippo Tabusso


2 Answers

The most efficient way to do this would be to read everything into mutable structures and then convert to immutable ones at the end, but this might require a lot of redundant coding for classes with a lot of fields. So instead, consider using the same pattern that the underlying collection uses: a job with a new task is a new job.

Here's an example that doesn't even bother reading the jobs list--it infers it from the task list. (This is an example that works under 2.7.x; recent versions of 2.8 use "Source.fromPath" instead of "Source.fromFile".)

object Example {
  class Task(val id: String) {
    override def toString = id
  }

  class Job(val id: String, val tasks: Set[Task]) {
    def this(id0: String, old: Option[Job], taskID: String) = {
      this(id0 , old.getOrElse(EmptyJob).tasks + new Task(taskID))
    }
    override def toString = id+" does "+tasks.toString
  }
  object EmptyJob extends Job("",Set.empty[Task]) { }

  def read(fname: String):Map[String,Job] = {
    val map = new scala.collection.mutable.HashMap[String,Job]()
    scala.io.Source.fromFile(fname).getLines.foreach(line => {
      line.split("\t") match {
        case Array(j,t) => {
          val jobID = j.trim
          val taskID = t.trim
          map += (jobID -> new Job(jobID,map.get(jobID),taskID))
        }
        case _ => /* Handle error? */
      }
    })
    new scala.collection.immutable.HashMap() ++ map
  }
}

scala> Example.read("tasks.txt")
res0: Map[String,Example.Job] = Map(JA -> JA does Set(T1, T3, T2), JB -> JB does Set(T2, T1), JC -> JC does Set(T1))

An alternate approach would read the job list (creating jobs as new Job(jobID,Set.empty[Task])), and then handle the error condition of when the task list contained an entry that wasn't in the job list. (You would still need to update the job list map every time you read in a new task.)

like image 114
Rex Kerr Avatar answered Oct 05 '22 23:10

Rex Kerr


I did a feel changes for it to run on Scala 2.8 (mostly, fromPath instead of fromFile, and () after getLines). It may be using a few Scala 2.8 features, most notably groupBy. Probably toSet as well, but that one is easy to adapt on 2.7.

I don't have the files to test it, but I changed this stuff from val to def, and the type signatures, at least, match.

class Task(val id: String)  
class Job(val id: String, val tasks: Set[Task])

// read tasks 
val tasks = (
  for {
    line <- io.Source.fromPath("tasks.txt").getLines().toStream
    tokens = line.split("\t") 
    jobId = tokens(0).trim
    task = new Task(jobId + "." + tokens(1).trim) 
  } yield jobId -> task
).groupBy(_._1).map { case (key, value) => key -> value.map(_._2).toSet }

// read jobs 
val jobs = Map() ++ (
  for {
    line <- io.Source.fromPath("jobs.txt").getLines()
    job = new Job(line.trim, tasks(line.trim))
  } yield job.id -> job
)
like image 38
Daniel C. Sobral Avatar answered Oct 06 '22 00:10

Daniel C. Sobral