Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can I convert pdf file to word file using Java [closed]

Tags:

java

ms-word

pdf

How can I convert a pdf file to word file using Java?

And, is it as easy as it looks like?

like image 433
Gentuzos Avatar asked Aug 01 '13 06:08

Gentuzos


2 Answers

Try PDFBOX

public class PDFTextReader
{
   static String pdftoText(String fileName) {
        PDFParser parser;
        String parsedText = null;
        PDFTextStripper pdfStripper = null;
        PDDocument pdDoc = null;
        COSDocument cosDoc = null;
        File file = new File(fileName);
        if (!file.isFile()) {
            System.err.println("File " + fileName + " does not exist.");
            return null;
        }
        try {
            parser = new PDFParser(new FileInputStream(file));
        } catch (IOException e) {
            System.err.println("Unable to open PDF Parser. " + e.getMessage());
            return null;
        }
        try {
            parser.parse();
            cosDoc = parser.getDocument();
            pdfStripper = new PDFTextStripper();
            pdDoc = new PDDocument(cosDoc);
            parsedText = pdfStripper.getText(pdDoc);
        } catch (Exception e) {
            System.err
                    .println("An exception occured in parsing the PDF Document."
                            + e.getMessage());
        } finally {
            try {
                if (cosDoc != null)
                    cosDoc.close();
                if (pdDoc != null)
                    pdDoc.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        return parsedText;
    }
    public static void main(String args[]){

         try {

            String content = pdftoText(PDF_FILE_PATH);

            File file = new File("/sample/filename.txt");

            // if file doesnt exists, then create it
            if (!file.exists()) {
                file.createNewFile();
            }

            FileWriter fw = new FileWriter(file.getAbsoluteFile());
            BufferedWriter bw = new BufferedWriter(fw);
            bw.write(content);
            bw.close();

            System.out.println("Done");

        } catch (IOException e) {
            e.printStackTrace();
        }
    } 
}
like image 101
newuser Avatar answered Nov 08 '22 18:11

newuser


I have looked deeply into this matter and I found that for proper results, you need cannot avoid using MS Word. Even funded projects such as LibreOffice struggle with the proper conversion as the Word format is rather complex and changes over the versions. Only MS Word keeps track of this.

For this reason, I implemented documents4j what delegates conversions to MS Word using a Java API. Furthermore, it allows you to move the conversions to a different machine which you can contact using a REST API. You find detailed information on its GitHub page.

like image 44
Rafael Winterhalter Avatar answered Nov 08 '22 18:11

Rafael Winterhalter