I have been trying to get through with the code for inference from trained labeled LDA model and pLDA using TMT toolbox(stanford nlp group). I have gone through the examples provided in the following links: http://nlp.stanford.edu/software/tmt/tmt-0.3/ http://nlp.stanford.edu/software/tmt/tmt-0.4/
Here is the code I'm trying for labeled LDA inference
val modelPath = file("llda-cvb0-59ea15c7-31-61406081-75faccf7");
val model = LoadCVB0LabeledLDA(modelPath);`
val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);
val text = {
source ~> // read from the source file
Column(4) ~> // select column containing text
TokenizeWith(model.tokenizer.get) //tokenize with model's tokenizer
}
val labels = {
source ~> // read from the source file
Column(2) ~> // take column two, the year
TokenizeWith(WhitespaceTokenizer())
}
val outputPath = file(modelPath, source.meta[java.io.File].getName.replaceAll(".csv",""));
val dataset = LabeledLDADataset(text,labels,model.termIndex,model.topicIndex);
val perDocTopicDistributions = InferCVB0LabeledLDADocumentTopicDistributions(model, dataset);
val perDocTermTopicDistributions =EstimateLabeledLDAPerWordTopicDistributions(model, dataset, perDocTopicDistributions);
TSVFile(outputPath+"-word-topic-distributions.tsv").write({
for ((terms,(dId,dists)) <- text.iterator zip perDocTermTopicDistributions.iterator) yield {
require(terms.id == dId);
(terms.id,
for ((term,dist) <- (terms.value zip dists)) yield {
term + " " + dist.activeIterator.map({
case (topic,prob) => model.topicIndex.get.get(topic) + ":" + prob
}).mkString(" ");
});
}
});
Error
found : scalanlp.collection.LazyIterable[(String, Array[Double])]
required: Iterable[(String, scalala.collection.sparse.SparseArray[Double])]
EstimateLabeledLDAPerWordTopicDistributions(model, dataset, perDocTopicDistributions);
I understand it's a type mismatch error. But I don't know how to resolve this for scala. Basically I don't understand how should I extract the 1. per doc topic distribution 2. per doc label distribution after the output of the infer command.
Please help. Same in the case of pLDA. I reach the inference command and after that clueless what to do with it.
Scala type system is much more complex then Java one and understanding it will make you a better programmer. The problem lies here:
val perDocTermTopicDistributions =EstimateLabeledLDAPerWordTopicDistributions(model, dataset, perDocTopicDistributions);
because either model, or dataset, or perDocTopicDistributions are of type:
scalanlp.collection.LazyIterable[(String, Array[Double])]
while EstimateLabeledLDAPerWordTopicDistributions.apply expects a
Iterable[(String, scalala.collection.sparse.SparseArray[Double])]
The best way to investigate this type errors is to look at the ScalaDoc (for example the one for tmt is there: http://nlp.stanford.edu/software/tmt/tmt-0.4/api/#package ) and if you cannot find out where the problem lies easily, you should explicit the type of your variables inside your code like the following:
val perDocTopicDistributions:LazyIterable[(String, Array[Double])] = InferCVB0LabeledLDADocumentTopicDistributions(model, dataset)
If we look together to the javadoc of edu.stanford.nlp.tmt.stage:
def
EstimateLabeledLDAPerWordTopicDistributions (model: edu.stanford.nlp.tmt.model.llda.LabeledLDA[_, _, _], dataset: Iterable[LabeledLDADocumentParams], perDocTopicDistributions: Iterable[(String, SparseArray[Double])]): LazyIterable[(String, Array[SparseArray[Double]])]
def
InferCVB0LabeledLDADocumentTopicDistributions (model: CVB0LabeledLDA, dataset: Iterable[LabeledLDADocumentParams]): LazyIterable[(String, Array[Double])]
It now should be clear to you that the return of InferCVB0LabeledLDADocumentTopicDistributions cannot be used directly to feed EstimateLabeledLDAPerWordTopicDistributions.
I never used stanford nlp but this is by design how the api works, so you only need to convert your scalanlp.collection.LazyIterable[(String, Array[Double])] into Iterable[(String, scalala.collection.sparse.SparseArray[Double])] before calling the function.
If you look at the scaladoc on how to do this conversion it's pretty simple. Inside the package stage, in package.scala I can read import scalanlp.collection.LazyIterable;
So I know where to look, and in fact inside http://www.scalanlp.org/docs/core/data/#scalanlp.collection.LazyIterable you have a toIterable method which turns a LazyIterable into an Iterable, still you have to transform your internal array into a SparseArray
Again, I look into the package.scala for the stage package inside tmt and I see: import scalala.collection.sparse.SparseArray; And I look for scalala documentation :
http://www.scalanlp.org/docs/scalala/0.4.1-SNAPSHOT/#scalala.collection.sparse.SparseArray
It turns out that the constructors seems complicated to me, so it sounds much like I would have to look into the companion object for a factory method. It turns out that the method I am looking for is there, and it's called apply like as usual in Scala.
def
apply [T] (values: T*)(implicit arg0: ClassManifest[T], arg1: DefaultArrayValue[T]): SparseArray[T]
By using this, you can write a function with the following signature:
def f: Array[Double] => SparseArray[Double]
Once this has done, you can turn your result of InferCVB0LabeledLDADocumentTopicDistributions into a non-lazy iterable of sparse Array with one line of code:
result.toIterable.map { case (name, values => (name, f(values)) }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With