I have an word document Docx file
As you can see in the word document there are a number of questions with Bullet Points. Right now I am trying to extract each paragraph from the file using apache POI. Here is my current code
public static String readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
String whole = "";
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
whole += "\n" + para.getText();
}
fis.close();
document.close();
return whole;
} catch (Exception e) {
e.printStackTrace();
return "";
}
}
The problem with above method is that it is printing each line instead of paragraphs. Also the bullet points are also gone from extracted whole
string. The whole
is returned a plain string.
Can anyone explain what I am doing wrong. Also please suggest if you have a better idea to solve it.
Above code is correct and I ran your code on my system that giving each and every paragraphs , I think problem with writting content on docx file whenever I wrote content in bullet points and uses 'enter' key than that breaks my current bullet points and above code make that breaked-line as saparate paragraph.
I am writting below code sample may be It's useful for you take a look here I am using Set datastructure for ignoring duplicate questions from docx .
Dependency of apache poi is below
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.7</version>
</dependency>
Code Sample :
package com;
import java.io.File;
import java.io.FileInputStream;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.springframework.util.ObjectUtils;
public class App {
public static void main(String...strings) throws Exception{
Set<String> bulletPoints = fileExtractor();
bulletPoints.forEach(point -> {
System.out.println(point);
});
}
public static Set<String> fileExtractor() throws Exception{
FileInputStream fis = null;
try {
Set<String> bulletPoints = new HashSet<>();
File file = new File("/home/deskuser/Documents/query.docx");
fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
paragraphs.forEach(para -> {
System.out.println(para.getText());
if(!ObjectUtils.isEmpty(para.getText())){
bulletPoints.add(para.getText());
}
});
fis.close();
return bulletPoints;
} catch (Exception e) {
e.printStackTrace();
throw new Exception("error while extracting file.", e);
}finally{
if(!ObjectUtils.isEmpty(fis)){
fis.close();
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With