I need to read a pdf file with filepath "C:\file.pdf" and write it to outputStream. What is the easiest way to do that?
@Controller
public class ExportTlocrt {
@Autowired
private PhoneBookService phoneBookSer;
private void setResponseHeaderTlocrtPDF(HttpServletResponse response) {
response.setContentType("application/pdf");
response.setHeader("content-disposition", "attachment; filename=Tlocrt.pdf" );
}
@RequestMapping(value = "/exportTlocrt.html", method = RequestMethod.POST)
public void exportTlocrt(Model model, HttpServletResponse response, HttpServletRequest request){
setResponseHeaderTlocrtPDF(response);
File f = new File("C:\\Tlocrt.pdf");
try {
OutputStream os = response.getOutputStream();
byte[] buf = new byte[8192];
InputStream is = new FileInputStream(f);
int c = 0;
while ((c = is.read(buf, 0, buf.length)) > 0) {
os.write(buf, 0, c);
os.flush();
}
os.close();
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
...........................................................................................
JDK does not provide any class to read PDF file. In order to read a PDF file, we depend on the third-party library. There are several third-party libraries are available to read a PDF file. So, in this section, we will use the Apache Tika library for reading a PDF file in Java.
import java.io.*;
public class FileRead {
public static void main(String[] args) throws IOException {
File f=new File("C:\\Documents and Settings\\abc\\Desktop\\abc.pdf");
OutputStream oos = new FileOutputStream("test.pdf");
byte[] buf = new byte[8192];
InputStream is = new FileInputStream(f);
int c = 0;
while ((c = is.read(buf, 0, buf.length)) > 0) {
oos.write(buf, 0, c);
oos.flush();
}
oos.close();
System.out.println("stop");
is.close();
}
}
The easiest way so far. Hope this helps.
You can use PdfBox from Apache which is simple to use and has good performance.
Here is an example of extracting text from a PDF file (you can read more here) :
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.util.*;
public class PDFTest {
public static void main(String[] args){
PDDocument pd;
BufferedWriter wr;
try {
File input = new File("C:\\Invoice.pdf"); // The PDF file from where you would like to extract
File output = new File("C:\\SampleText.txt"); // The text file where you are going to store the extracted data
pd = PDDocument.load(input);
System.out.println(pd.getNumberOfPages());
System.out.println(pd.isEncrypted());
pd.save("CopyOfInvoice.pdf"); // Creates a copy called "CopyOfInvoice.pdf"
PDFTextStripper stripper = new PDFTextStripper();
wr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(output)));
stripper.writeText(pd, wr);
if (pd != null) {
pd.close();
}
// I use close() to flush the stream.
wr.close();
} catch (Exception e){
e.printStackTrace();
}
}
}
UPDATE:
You can get the text using PDFTextStripper:
PDFTextStripper reader = new PDFTextStripper();
String pageText = reader.getText(pd); // PDDocument object created
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With