How to open and manipulate Word document/template in Java?

Question

I need to open a .doc/.dot/.docx/.dotx (I'm not picky, I just want it to work) document, parse it for placeholders (or something similar), put my own data, and then return generated .doc/.docx/.dotx/.pdf document.

And on top of all that, I need the tools to accomplish that to be free.

I've searched around for something that would suit my needs, but I can't find anything. Tools like Docmosis, Javadocx, Aspose etc. are commercial. From what I've read, Apache POI is nowhere near successfully implementing this (they currently don't have any official developer working on Word part of framework).

The only thing that looks that could do the trick is OpenOffice UNO API. But that is a pretty big byte for someone that has never used this API (like me).

So if I am going to jump into this, I need to make sure that I am on the right path.

Can someone give me some advice on this?

kensvebary · Accepted Answer

I know it's been a long time since I've posted this question, and I said that I would post my solution when I'm finished. So here it is.

I hope that it will help someone someday. This is a full working class, and all you have to do is put it in your application, and place TEMPLATE_DIRECTORY_ROOT directory with .docx templates in your root directory.

Usage is very simple. You put placeholders (key) in your .docx file, and then pass file name and Map containing corresponding key-value pairs for that file.

Enjoy!

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.Closeable;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.URI;
import java.util.Deque;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Map;
import java.util.UUID;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;

import javax.faces.context.ExternalContext;
import javax.faces.context.FacesContext;
import javax.servlet.http.HttpServletResponse;

public class DocxManipulator {

    private static final String MAIN_DOCUMENT_PATH = "word/document.xml";
    private static final String TEMPLATE_DIRECTORY_ROOT = "TEMPLATES_DIRECTORY/";


    /*    PUBLIC METHODS    */

    /**
     * Generates .docx document from given template and the substitution data
     * 
     * @param templateName
     *            Template data
     * @param substitutionData
     *            Hash map with the set of key-value pairs that represent
     *            substitution data
     * @return
     */
    public static Boolean generateAndSendDocx(String templateName, Map<String,String> substitutionData) {

        String templateLocation = TEMPLATE_DIRECTORY_ROOT + templateName;

        String userTempDir = UUID.randomUUID().toString();
        userTempDir = TEMPLATE_DIRECTORY_ROOT + userTempDir + "/";

        try {

            // Unzip .docx file
            unzip(new File(templateLocation), new File(userTempDir));       

            // Change data
            changeData(new File(userTempDir + MAIN_DOCUMENT_PATH), substitutionData);

            // Rezip .docx file
            zip(new File(userTempDir), new File(userTempDir + templateName));

            // Send HTTP response
            sendDOCXResponse(new File(userTempDir + templateName), templateName);

            // Clean temp data
            deleteTempData(new File(userTempDir));
        } 
        catch (IOException ioe) {
            System.out.println(ioe.getMessage());
            return false;
        }

        return true;
    }


    /*    PRIVATE METHODS    */

    /**
     * Unzipps specified ZIP file to specified directory
     * 
     * @param zipfile
     *            Source ZIP file
     * @param directory
     *            Destination directory
     * @throws IOException
     */
    private static void unzip(File zipfile, File directory) throws IOException {

        ZipFile zfile = new ZipFile(zipfile);
        Enumeration<? extends ZipEntry> entries = zfile.entries();

        while (entries.hasMoreElements()) {
          ZipEntry entry = entries.nextElement();
          File file = new File(directory, entry.getName());
          if (entry.isDirectory()) {
            file.mkdirs();
          } 
          else {
            file.getParentFile().mkdirs();
            InputStream in = zfile.getInputStream(entry);
            try {
              copy(in, file);
            } 
            finally {
              in.close();
            }
          }
        }
      }


    /**
     * Substitutes keys found in target file with corresponding data
     * 
     * @param targetFile
     *            Target file
     * @param substitutionData
     *            Map of key-value pairs of data
     * @throws IOException
     */
    @SuppressWarnings({ "unchecked", "rawtypes" })
    private static void changeData(File targetFile, Map<String,String> substitutionData) throws IOException{

        BufferedReader br = null;
        String docxTemplate = "";
        try {
            br = new BufferedReader(new InputStreamReader(new FileInputStream(targetFile), "UTF-8"));
            String temp;
            while( (temp = br.readLine()) != null)
                docxTemplate = docxTemplate + temp; 
            br.close();
            targetFile.delete();
        } 
        catch (IOException e) {
            br.close();
            throw e;
        }

        Iterator substitutionDataIterator = substitutionData.entrySet().iterator();
        while(substitutionDataIterator.hasNext()){
            Map.Entry<String,String> pair = (Map.Entry<String,String>)substitutionDataIterator.next();
            if(docxTemplate.contains(pair.getKey())){
                if(pair.getValue() != null)
                    docxTemplate = docxTemplate.replace(pair.getKey(), pair.getValue());
                else
                    docxTemplate = docxTemplate.replace(pair.getKey(), "NEDOSTAJE");
            }
        }

        FileOutputStream fos = null;
        try{
            fos = new FileOutputStream(targetFile);
            fos.write(docxTemplate.getBytes("UTF-8"));
            fos.close();
        }
        catch (IOException e) {
            fos.close();
            throw e;
        }
    }

    /**
     * Zipps specified directory and all its subdirectories
     * 
     * @param directory
     *            Specified directory
     * @param zipfile
     *            Output ZIP file name
     * @throws IOException
     */
    private static void zip(File directory, File zipfile) throws IOException {

        URI base = directory.toURI();
        Deque<File> queue = new LinkedList<File>();
        queue.push(directory);
        OutputStream out = new FileOutputStream(zipfile);
        Closeable res = out;

        try {
          ZipOutputStream zout = new ZipOutputStream(out);
          res = zout;
          while (!queue.isEmpty()) {
            directory = queue.pop();
            for (File kid : directory.listFiles()) {
              String name = base.relativize(kid.toURI()).getPath();
              if (kid.isDirectory()) {
                queue.push(kid);
                name = name.endsWith("/") ? name : name + "/";
                zout.putNextEntry(new ZipEntry(name));
              } 
              else {
                if(kid.getName().contains(".docx"))
                    continue;  
                zout.putNextEntry(new ZipEntry(name));
                copy(kid, zout);
                zout.closeEntry();
              }
            }
          }
        } 
        finally {
          res.close();
        }
      }

    /**
     * Sends HTTP Response containing .docx file to Client
     * 
     * @param generatedFile
     *            Path to generated .docx file
     * @param fileName
     *            File name of generated file that is being presented to user
     * @throws IOException
     */
    private static void sendDOCXResponse(File generatedFile, String fileName) throws IOException {

        FacesContext facesContext = FacesContext.getCurrentInstance();
        ExternalContext externalContext = facesContext.getExternalContext();
        HttpServletResponse response = (HttpServletResponse) externalContext
                .getResponse();

        BufferedInputStream input = null;
        BufferedOutputStream output = null;

        response.reset();
        response.setHeader("Content-Type", "application/msword");
        response.setHeader("Content-Disposition", "attachment; filename=\"" + fileName + "\"");
        response.setHeader("Content-Length",String.valueOf(generatedFile.length()));

        input = new BufferedInputStream(new FileInputStream(generatedFile), 10240);
        output = new BufferedOutputStream(response.getOutputStream(), 10240);

        byte[] buffer = new byte[10240];
        for (int length; (length = input.read(buffer)) > 0;) {
            output.write(buffer, 0, length);
        }

        output.flush();
        input.close();
        output.close();

        // Inform JSF not to proceed with rest of life cycle
        facesContext.responseComplete();
    }


    /**
     * Deletes directory and all its subdirectories
     * 
     * @param file
     *            Specified directory
     * @throws IOException
     */
    public static void deleteTempData(File file) throws IOException {

        if (file.isDirectory()) {

            // directory is empty, then delete it
            if (file.list().length == 0)
                file.delete();
            else {
                // list all the directory contents
                String files[] = file.list();

                for (String temp : files) {
                    // construct the file structure
                    File fileDelete = new File(file, temp);
                    // recursive delete
                    deleteTempData(fileDelete);
                }

                // check the directory again, if empty then delete it
                if (file.list().length == 0)
                    file.delete();
            }
        } else {
            // if file, then delete it
            file.delete();
        }
    }

    private static void copy(InputStream in, OutputStream out) throws IOException {

        byte[] buffer = new byte[1024];
        while (true) {
          int readCount = in.read(buffer);
          if (readCount < 0) {
            break;
          }
          out.write(buffer, 0, readCount);
        }
      }

      private static void copy(File file, OutputStream out) throws IOException {
        InputStream in = new FileInputStream(file);
        try {
          copy(in, out);
        } finally {
          in.close();
        }
      }

      private static void copy(InputStream in, File file) throws IOException {
        OutputStream out = new FileOutputStream(file);
        try {
          copy(in, out);
        } finally {
          out.close();
        }
     }

}

meriton · Answer

Since a docx file is merely a zip-archive of xml files (plus any binary files for embedded objects such as images), we met that requirement by unpacking the zip file, feeding the document.xml to a template engine (we used freemarker) that does the merging for us, and then zipping the output document to get the new docx file.

The template document then is simply an ordinary docx with embedded freemarker expressions / directives, and can be edited in Word.

Since (un)zipping can be done with the JDK, and Freemarker is open source, you don't incur any licence fees, not even for word itself.

The limitation is that this approach can only emit docx or rtf files, and the output document will have the same filetype as the template. If you need to convert the document to another format (such as pdf) you'll have to solve that problem separately.

Fabrizio · Answer

I ended up relying on Apache Poi 3.12 and processing paragraphs (separately extracting paragraphs also from tables, headers/footers, and footnotes, as such paragraphs aren't returned by XWPFDocument.getParagraphs() ).

The processing code (~100 lines) and unit tests are here on github.

How to open and manipulate Word document/template in Java?

Tags:

java

ms-word

kensvebary

3 Answers

kensvebary

meriton

Fabrizio

Recent Activity

Donate For Us

How to open and manipulate Word document/template in Java?

Tags:

java

ms-word

kensvebary

3 Answers

kensvebary

meriton

Fabrizio

Related questions

Recent Activity

Donate For Us