Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add new mime type to apache tika

This is my class for reading mime types. I am trying to add a new mime type(properties file) and read it.

This is my class file:

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package check_mime;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.tika.Tika;
import org.apache.tika.mime.MimeTypes;


public class TikaFileTypeDetector {

    private final Tika tika = new Tika();

    public TikaFileTypeDetector() {
        super();
    }

    public String probeContentType(Path path) throws IOException {

        // Check contents first
        String fileContentDetect = tika.detect(path.toFile());
        if (!fileContentDetect.equals(MimeTypes.OCTET_STREAM)) {
            return fileContentDetect;
        }

        // Try file name only if content search was not successful
        String fileNameDetect = tika.detect(path.toString());
        if (!fileNameDetect.equals(MimeTypes.OCTET_STREAM)) {
            return fileNameDetect;
        }

        return null;
    }

    public static void main(String[] args) throws IOException {

        Tika tika = new Tika();

        if (args.length != 1) {
            printUsage();
            return;
        }
        Path path = Paths.get(args[0]);

        TikaFileTypeDetector detector = new TikaFileTypeDetector();

        String contentType = detector.probeContentType(path);

        System.out.println("File is of type - " + contentType);
    }

    public static void printUsage() {
        System.out.print("Usage: java -classpath ... "
                + TikaFileTypeDetector.class.getName()
                + " ");
    }
}

From the docs I have created a custom xml:

 <?xml version="1.0" encoding="UTF-8"?>
 <mime-info>
   <mime-type type="text/properties">
          <glob pattern="*.properties"/>
   </mime-type>
 </mime-info>

Now how do I add to my program and read it. Do I have to create a parser? I'm stuck here.

like image 973
kittu Avatar asked Jun 17 '15 15:06

kittu


People also ask

How do you specify the MIME type of a file?

3.2. Another way to get the MIME type of a file is by reading its content. We can determine the MIME type according to specific characteristics of the file content. For example, a JPG starts with the hex signature FF D8 and ends with FF D9.

Can a file have multiple MIME types?

Multiple MIME types can use one extension. For example, if your organization uses multiple versions of a program, you can define a MIME type for each version; however, file names of all versions use the same extension.

What is a file's MIME type?

A media type (also known as a Multipurpose Internet Mail Extensions or MIME type) indicates the nature and format of a document, file, or assortment of bytes. MIME types are defined and standardized in IETF's RFC 6838.


1 Answers

This is covered in the Apache Tika 5 minute parser instructions. To add support for Java .properties files, you should first create a file called custom-mimetypes.xml and populate it with something like:

<?xml version="1.0" encoding="UTF-8"?>
<mime-info>
  <mime-type type="text/properties">
     <_comment>Java Properties</_comment>
     <glob pattern="*.properties"/>
     <sub-class-of type="text/plain"/>
   </mime-type>
</mime-info>

Next, you need to put that somewhere that Tika can find it, with the right name. It must be stored as org/apache/tika/mime/custom-mimetypes.xml on your classpath. The easiest thing to do is to create that directory structure, move the new file in, then add the root directory to your classpath. For deployment, you should wrap that up into a jar and put it on the classpath

You can use the Tika App to check your mime type file was loaded, if you're careful. With your code pacakged as a jar, run it as something like:

java -classpath tika-app-1.10-SNAPSHOT.jar:my-custom-mimetypes.jar org.apache.tika.cli.TikaCLI --list-supported-types | grep text/properties

Alternately, if you have it in a local directory, try something like

ls -l org/apache/tika/mime/custom-mimetypes.xml
# Check a file was found, with some content in it
java -classpath tika-app-1.10-SNAPSHOT.jar:. org.apache.tika.cli.TikaCLI --list-supported-types | grep text/properties

If that isn't showing your mime type, then you didn't get the path or filename correct, double check them

(Alternately, upgrade to a newer version of Apache Tika, as since r1686315 Tika has a Java Properties mimetype built in!)

like image 68
Gagravarr Avatar answered Oct 30 '22 07:10

Gagravarr