I would like to automatically identify dates inside a stream of documents and in this sense I would like to use the code provided by the open source project Heideltime, available here (https://code.google.com/p/heideltime/). I have installed the Heideltime kit (not the standalone version) and now I am wondering how can I reference it and call it inside my Java project. I have already added a dependecy to Heideltime inside my pom.xml:
<dependency>
<groupId>de.unihd.dbs</groupId>
<artifactId>heideltime</artifactId>
<version>1.7</version>
</dependency>
however I am not sure how to call the classes from this source project into my own project. I am using Maven for both. Anyone who has used it before could maybe give me a suggestion or piece of advice? Many thanks!
heideltime-kit is itself a Maven project. So, you can add the heideltime-kit project as a dependency. (In Netbeans, right click on Dependencies, --> Add Dependency --> Open Projects (make sure the project is open first) --> HeidelTime)
Then move the config.props file into your project's src/main/resources folder. Set the path to treetagger within config.props.
As far as using the classes goes, you'll want to create an instance of HeidelTimeStandalone (see de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.java) using POSTagger.TREETAGGER as the posTagger parameter and a hardcoded path to your src/main/resources/config.props file as the configPath parameter. For example,
heidelTime = new HeidelTimeStandalone(Language.ENGLISH,
DocumentType.COLLOQUIAL,
OutputType.TIMEML,
"path/to/config.props",
POSTagger.TREETAGGER, true);
Then to use HeidelTime to process text, you can simply call the process function:
String result = heidelTime.process(text, date);
Adding to the reply from jgloves, you might be interested to parse the Heideltime result string into a Java object representation. The following code transforms the Uima-XML representation into Timex3 objects.
HeidelTimeStandalone time = new HeidelTimeStandalone(Language.GERMAN, DocumentType.SCIENTIFIC, OutputType.XMI, "config.props", POSTagger.STANFORDPOSTAGGER);
String xmiRepresentation = time.process(document, documentCreationTime); //Apply Heideltime and get the XML-UIMA representation
JCas cas = jcasFactory.createJCas();
for(FSIterator<Annotation> it= cas.getAnnotationIndex(Timex3.type).iterator(); it.hasNext(); ){
System.out.printkn(it.next);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With