I have a data structure made of Jobs each containing a set of Tasks. Both Job and Task data are defined in files like these:
jobs.txt:
JA
JB
JC
tasks.txt:
JB T2
JA T1
JC T1
JA T3
JA T2
JB T1
The process of creating objects is the following:
- read each job, create it and store it by id
- read task, retrieve job by id, create task, store task in the job
Once the files are read this data structure is never modified. So I would like that tasks within jobs would be stored in an immutable set. But I don't know how to do it in an efficient way. (Note: the immutable map storing jobs may be left immutable)
Here is a simplified version of the code:
class Task(val id: String)
class Job(val id: String) {
val tasks = collection.mutable.Set[Task]() // This sholud be immutable
}
val jobs = collection.mutable.Map[String, Job]() // This is ok to be mutable
// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) {
val job = new Job(line.trim)
jobs += (job.id -> job)
}
// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
val tokens = line.split("\t")
val job = jobs(tokens(0).trim)
val task = new Task(job.id + "." + tokens(1).trim)
job.tasks += task
}
Thanks in advance for every suggestion!
The most efficient way to do this would be to read everything into mutable structures and then convert to immutable ones at the end, but this might require a lot of redundant coding for classes with a lot of fields. So instead, consider using the same pattern that the underlying collection uses: a job with a new task is a new job.
Here's an example that doesn't even bother reading the jobs list--it infers it from the task list. (This is an example that works under 2.7.x; recent versions of 2.8 use "Source.fromPath" instead of "Source.fromFile".)
object Example {
class Task(val id: String) {
override def toString = id
}
class Job(val id: String, val tasks: Set[Task]) {
def this(id0: String, old: Option[Job], taskID: String) = {
this(id0 , old.getOrElse(EmptyJob).tasks + new Task(taskID))
}
override def toString = id+" does "+tasks.toString
}
object EmptyJob extends Job("",Set.empty[Task]) { }
def read(fname: String):Map[String,Job] = {
val map = new scala.collection.mutable.HashMap[String,Job]()
scala.io.Source.fromFile(fname).getLines.foreach(line => {
line.split("\t") match {
case Array(j,t) => {
val jobID = j.trim
val taskID = t.trim
map += (jobID -> new Job(jobID,map.get(jobID),taskID))
}
case _ => /* Handle error? */
}
})
new scala.collection.immutable.HashMap() ++ map
}
}
scala> Example.read("tasks.txt")
res0: Map[String,Example.Job] = Map(JA -> JA does Set(T1, T3, T2), JB -> JB does Set(T2, T1), JC -> JC does Set(T1))
An alternate approach would read the job list (creating jobs as new Job(jobID,Set.empty[Task])), and then handle the error condition of when the task list contained an entry that wasn't in the job list. (You would still need to update the job list map every time you read in a new task.)
I did a feel changes for it to run on Scala 2.8 (mostly, fromPath
instead of fromFile
, and ()
after getLines
). It may be using a few Scala 2.8 features, most notably groupBy
. Probably toSet
as well, but that one is easy to adapt on 2.7.
I don't have the files to test it, but I changed this stuff from val
to def
, and the type signatures, at least, match.
class Task(val id: String)
class Job(val id: String, val tasks: Set[Task])
// read tasks
val tasks = (
for {
line <- io.Source.fromPath("tasks.txt").getLines().toStream
tokens = line.split("\t")
jobId = tokens(0).trim
task = new Task(jobId + "." + tokens(1).trim)
} yield jobId -> task
).groupBy(_._1).map { case (key, value) => key -> value.map(_._2).toSet }
// read jobs
val jobs = Map() ++ (
for {
line <- io.Source.fromPath("jobs.txt").getLines()
job = new Job(line.trim, tasks(line.trim))
} yield job.id -> job
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With