I'm trying to use a Java library called langdetect
hosted here. It couldn't be easier to use:
Detector detector;
String langDetected = "";
try {
String path = "C:/Users/myUser/Desktop/jars/langdetect/profiles";
DetectorFactory.loadProfile(path);
detector = DetectorFactory.create();
detector.append(text);
langDetected = detector.detect();
}
catch (LangDetectException e) {
throw e;
}
return langDetected;
Except with respect to the DetectFactory.loadProfile
method. This library works great when I pass it an absolute file path, but ultimately I think I need to package my code and langdetect
's companion profiles
directory inside the same JAR file:
myapp.jar/
META-INF/
langdetect/
profiles/
af
bn
en
...etc.
com/
me/
myorg/
LangDetectAdaptor --> is what actually uses the code above
I will make sure that the LangDetectAdaptor
which is located inside myapp.jar
is supplied with both the langdetect.jar
and jsonic.jar
dependencies it needs for langdetect
to work at runtime. However I'm confused as to what I need to pass in to DetectFactory.loadProfile
in order to work:
langdetect
JAR ships with the profiles
directory, but you need to initialize it from inside your JAR. So do I copy the profiles
directory and put it inside my JAR (like I prescribe above), or is there a way to keep it inside langdetect.jar
but access it from inside my code?Thanks in advance for any help here!
Edit : I think the problem here is that langdetect
ships with this profiles
directory, but then wants you to initialize it from inside your JAR. The API would probably benefit from being changed a little bit to just consider profiles
its own configuration, and to then provide methods like DetectFactory.loadProfiles().except("fr")
in the event that you don't want it to initialize French, etc. But this still doesn't solve my problem!
I have the same problem. You can load the profiles from the LangDetect jar using JarUrlConnection and JarEntry. Note in this example I am using Java 7 resource management.
String dirname = "profiles/";
Enumeration<URL> en = Detector.class.getClassLoader().getResources(
dirname);
List<String> profiles = new ArrayList<>();
if (en.hasMoreElements()) {
URL url = en.nextElement();
JarURLConnection urlcon = (JarURLConnection) url.openConnection();
try (JarFile jar = urlcon.getJarFile();) {
Enumeration<JarEntry> entries = jar.entries();
while (entries.hasMoreElements()) {
String entry = entries.nextElement().getName();
if (entry.startsWith(dirname)) {
try (InputStream in = Detector.class.getClassLoader()
.getResourceAsStream(entry);) {
profiles.add(IOUtils.toString(in));
}
}
}
}
}
DetectorFactory.loadProfile(profiles);
Detector detector = DetectorFactory.create();
detector.append(text);
String langDetected = detector.detect();
System.out.println(langDetected);
Since no maven-support was available, and the mechanism to load profiles was not perfect (since you you need to define files instead of resources), I created a fork which solves that problem:
https://github.com/galan/language-detector
I mailed the original author, so he can fork/maintain the changes, but no luck - seems the project is abandoned.
Here is an example of how to use it now (own profiles can be written where necessary):
DetectorFactory.loadProfile(new DefaultProfile()); // SmProfile is also available
Detector detector = DetectorFactory.create();
detector.append(input);
String result = detector.detect();
// maybe work with detector.getProbabilities()
I don't like the static approach the DetectorFactory uses, but I won't rewrite the full project, you have to create your own fork/pull request :)
Looks like the library only accepts files. You can either change the code and try submitting the changes upstream. Or write your resource to a temp file and get it to load that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With