Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using cTAKES to parse clinical documents

Tags:

java

uima

ctakes

I am trying to figure out how to run the Clinical Document Pipeline from Java. I have a set of clinical documents as plain texts. I want to parse these documents and extract a list of that is in document doc_ID, there is CUI with frequency of freq. I spent several days installing cTAKES and looking for a solution. I narrow it down to ClinicalPipelineWithUmls.java where gets a test and runs SimplePipeline with a AnaylisisEngineDescription. Here is a part of the code:

String documentText = "Text of document to test goes here, such as the following. No edema, some soreness, denies pain.";
InputStream inStream = InputStreamCollectionReader.convertToByteArrayInputStream(documentText);
CollectionReader collectionReader = InputStreamCollectionReader.getCollectionReader(inStream);
AnalysisEngineDescription pipelineIncludingUmlsDictionaries = AnalysisEngineFactory.createAnalysisEngineDescription(
            "desc/analysis_engine/AggregatePlaintextUMLSProcessor");
AnalysisEngineDescription xWriter = AnalysisEngineFactory.createPrimitiveDescription(
            XWriter.class,
            XWriter.PARAM_OUTPUT_DIRECTORY_NAME,
            AssertionConst.evalOutputDir,
            XWriter.PARAM_XML_SCHEME_NAME,
            XWriter.XMI,
            XWriter.PARAM_FILE_NAMER_CLASS_NAME,
            CtakesFileNamer.class.getName());
SimplePipeline.runPipeline(collectionReader, pipelineIncludingUmlsDictionaries, xWriter);
System.out.println("Done at " + new Date());

The problem is it can not find "InputStreamCollectionReader". I searched for it but no success so far! Would you please give me a hint or show some directions? thanks for any help!

like image 290
user2600417 Avatar asked Oct 21 '13 20:10

user2600417


1 Answers

Is there any particular reason why you want to use InputStreamCollectionReader? Otherwise, there are examples on how to use TextReader here.

like image 54
Renaud Avatar answered Sep 30 '22 14:09

Renaud