Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find and replace text in word files both doc and docx

Tags:

java

I want to find and replace text using Java in doc format and docx format files using Java.

What I tried: I tried reading those files as text files but didn't succeed.

I have no idea how to proceed or what else to try, can anyone give me direction?

like image 563
ROHITHKUMAR A Avatar asked Dec 01 '16 13:12

ROHITHKUMAR A


2 Answers

I hope this would solve your problem my friend. I have written it for docx to search and replace using apache.poi I recommend you to read complete Apache POI for more

public class Find_Replace_DOCX {

     public static void main(String args[]) throws IOException,
       InvalidFormatException,
       org.apache.poi.openxml4j.exceptions.InvalidFormatException {
      try {

       /**
        * if uploaded doc then use HWPF else if uploaded Docx file use
        * XWPFDocument
        */
       XWPFDocument doc = new XWPFDocument(
         OPCPackage.open("d:\\1\\rpt.docx"));
       for (XWPFParagraph p : doc.getParagraphs()) {
        List<XWPFRun> runs = p.getRuns();
        if (runs != null) {
         for (XWPFRun r : runs) {
          String text = r.getText(0);
          if (text != null && text.contains("$$key$$")) {
           text = text.replace("$$key$$", "ABCD");//your content
           r.setText(text, 0);
          }
         }
        }
       }

       for (XWPFTable tbl : doc.getTables()) {
        for (XWPFTableRow row : tbl.getRows()) {
         for (XWPFTableCell cell : row.getTableCells()) {
          for (XWPFParagraph p : cell.getParagraphs()) {
           for (XWPFRun r : p.getRuns()) {
            String text = r.getText(0);
            if (text != null && text.contains("$$key$$")) {
             text = text.replace("$$key$$", "abcd");
             r.setText(text, 0);
            }
           }
          }
         }
        }
       }

       doc.write(new FileOutputStream("d:\\1\\output.docx"));
      } finally {

      }

     }

    }
like image 52
KishanCS Avatar answered Sep 20 '22 12:09

KishanCS


These document formats are complex objects that you almost certainly don't want to try to parse yourself. I would strongly suggest that you take a look at the apache poi libraries - these libraries have function to load and save doc and docx formats and means to access and modify the content of the files.

They are well documented, open source, currently maintained and freely available.

In Summary use these libraries to: a) load the file b) go through the content of the file programmatically and modify it as you need (i.e. do the search and replace) and c) save it back to disk.

like image 33
Elemental Avatar answered Sep 22 '22 12:09

Elemental