How can I convert a pdf file to word file using Java?
And, is it as easy as it looks like?
Try PDFBOX
public class PDFTextReader
{
static String pdftoText(String fileName) {
PDFParser parser;
String parsedText = null;
PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
File file = new File(fileName);
if (!file.isFile()) {
System.err.println("File " + fileName + " does not exist.");
return null;
}
try {
parser = new PDFParser(new FileInputStream(file));
} catch (IOException e) {
System.err.println("Unable to open PDF Parser. " + e.getMessage());
return null;
}
try {
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
} catch (Exception e) {
System.err
.println("An exception occured in parsing the PDF Document."
+ e.getMessage());
} finally {
try {
if (cosDoc != null)
cosDoc.close();
if (pdDoc != null)
pdDoc.close();
} catch (Exception e) {
e.printStackTrace();
}
}
return parsedText;
}
public static void main(String args[]){
try {
String content = pdftoText(PDF_FILE_PATH);
File file = new File("/sample/filename.txt");
// if file doesnt exists, then create it
if (!file.exists()) {
file.createNewFile();
}
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
System.out.println("Done");
} catch (IOException e) {
e.printStackTrace();
}
}
}
I have looked deeply into this matter and I found that for proper results, you need cannot avoid using MS Word. Even funded projects such as LibreOffice struggle with the proper conversion as the Word format is rather complex and changes over the versions. Only MS Word keeps track of this.
For this reason, I implemented documents4j what delegates conversions to MS Word using a Java API. Furthermore, it allows you to move the conversions to a different machine which you can contact using a REST API. You find detailed information on its GitHub page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With