Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use cTAKES from the command line?

Tags:

nlp

ctakes

I wonder how to use Apache cTAKES from the command line.

E.g. :

  • I have a file note.txt that contains some text like "Patient had elevated blood sugar but tests confirm no diabetes. Patient's father had adult onset diabetes."
  • I want to use the provided analysis engine \apache-ctakes-3.2.2-bin\apache-ctakes-3.2.2\desc\ctakes-clinical-pipeline\desc\analysis_engine\AggregatePlaintextUMLSProcessor.xml

How can I get the analyse engine's output (viz. the annotations) using the command line (i.e. without using graphical user interfaces such as UIMA CAS Visual Debugger or the Collection Processing Engine)? I'd prefer to use the provided JAR files rather than having to compile the code.

The question is fairly simple but I couldn't find the information in cTAKES's README or on Confluence.

like image 323
Franck Dernoncourt Avatar asked Oct 04 '15 23:10

Franck Dernoncourt


2 Answers

Please try the following steps to use cTAKES CPE from the command line (the key class is "org.apache.uima.examples.cpe.SimpleRunCPE"):

  1. Change directory to $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/

  2. Copy test_plaintext.xml to another file (e.g., "test_plaintext_test.xml").

  3. Edit "test_plaintext_test.xml" to set input directory; find "nameValuePair" with name = "InputDirectory", and set the value string to the input directory. The following example set the input directory as "$CTAKES_HOME/note_input":

    <nameValuePair>
        <name>InputDirectory</name>
        <value>
            <string>note_input</string>
        </value>
    </nameValuePair>
    
  4. Similarly, edit "test_plaintext_test.xml" to set the output directory ("$CTAKES_HOME/result_output" in the following example):

    <nameValuePair>
        <name>OutputDirectory</name>
        <value>
            <string>result_output</string>
        </value>
    </nameValuePair>
    
  5. Save "test_plaintext_test.xml" and change directory to $CTAKES_HOME/bin.

  6. Copy runctakesCPE.sh to another file (e.g., "runctakesCPE_CLI.sh").

  7. Edit "runctakesCPE_CLI.sh"; replace the last line ("java ...") to the following line ("USER" and "PW" should be replaced by your UMLS Username and Password, and the memory setting Xms and Xms may be adjusted based on the size of memory on your machine):

    java -Dctakes.umlsuser=USER -Dctakes.umlspw=PW -cp $CTAKES_HOME/lib/*:$CTAKES_HOME/desc/:$CTAKES_HOME/resources/ -Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g org.apache.uima.examples.cpe.SimpleRunCPE $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test_plaintext_test.xml
    
  8. Save "runctakesCPE_CLI.sh", and then create the input directory ("$CTAKES_HOME/note_input") and the output directory ("$CTAKES_HOME/result_output").

  9. Put your note.txt to the input directory (e.g., "$CTAKES_HOME/note_input/note.txt"), and then run "runctakesCPE_CLI.sh".

  10. cTAKES CPE will start running under command line mode, and the resulting file will be generated in the output directory (e.g., "$CTAKES_HOME/result_output/note.txt.xml").

I actually used your note.txt to run the steps above and here are the first several lines of the generated note.txt.xml:

    <?xml version="1.0" encoding="UTF-8"?><CAS version="2">
        <uima.cas.Sofa _indexed="0" _id="3" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="Patient had elevated blood sugar but tests confirm no diabetes. Patient's father had adult onset diabetes.&#10;"/>
        <org.apache.ctakes.typesystem.type.structured.DocumentID _indexed="1" _id="1" documentID="note.txt"/>
        <uima.tcas.DocumentAnnotation _indexed="1" _id="10" _ref_sofa="3" begin="0" end="107" language="x-unspecified"/>
        <org.apache.ctakes.typesystem.type.textspan.Segment _indexed="1" _id="15" _ref_sofa="3" begin="0" end="107" id="SIMPLE_SEGMENT"/>
        <org.apache.ctakes.typesystem.type.textspan.Sentence _indexed="1" _id="21" _ref_sofa="3" begin="0" end="63" sentenceNumber="0"/>

Hope this helps :-)

like image 117
Tsung-Ting Kuo Avatar answered Nov 12 '22 07:11

Tsung-Ting Kuo


java -Dctakes.umlsuser=USER -Dctakes.umlspw=PW -cp $CTAKES_HOME/lib/*;$CTAKES_HOME/desc/;$CTAKES_HOME/resources‌​/ - Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms2g -Xmx3g to_replace $CTAKES_HOME/desc/ctakes-clinical-pipeline/desc/collection_p‌​rocessing_engine/tes‌​t_plaintext_test.xml

replace "to_replace" with either

org.apache.ctakes.ytex.tools.RunCPE or org.apache.ctakes.core.cpe.CmdLineCpeRunner

like image 44
Guanhua Lee Avatar answered Nov 12 '22 05:11

Guanhua Lee