I need to download a pdf file from a webserver to my pc and save it locally.
I used Httpclient to connect to webserver and get the content body:
HttpEntity entity=response.getEntity();
InputStream in=entity.getContent();
String stream = CharStreams.toString(new InputStreamReader(in));
int size=stream.length();
System.out.println("stringa html page LENGTH:"+stream.length());
System.out.println(stream);
SaveToFile(stream);
Then i save content in a file:
//check CRLF (i don't know if i need to to this)
String[] fix=stream.split("\r\n");
File file=new File("C:\\Users\\augusto\\Desktop\\progetti web\\test\\test2.pdf");
PrintWriter out = new PrintWriter(new FileWriter(file));
for (int i = 0; i < fix.length; i++) {
out.print(fix[i]);
out.print("\n");
}
out.close();
I also tried to save a String content to file directly:
OutputStream out=new FileOutputStream("pathPdfFile");
out.write(stream.getBytes());
out.close();
But the result is always the same: I can open pdf file but i can see white pages only. Does the mistake is around pdf stream and endstream charset encoding? Does pdf content between stream and endStream need to be manipulate in some others way?
Hope this helps to avoid some misunderstanding about what i want to do:
This is my login (works perfectly):
public static void postForm(){
String cookie="";
try {
System.out.println("POSTFORM ###################################");
String postURL = "http://login.libero.it/logincheck.php";
HttpPost post = new HttpPost(postURL);
post.setHeader("User-Agent", "Chrome/14.0.835.202");
post.setHeader("Referer","http://login.libero.it/?layout=m&service_id=m_mail&ret_url=http://m.mailbeta.libero.it/m/wmm/auth/check");
if(cookieVector.size()>0){
for(int i=0;i<cookieVector.size();i++){
cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
}
post.setHeader("Cookie",cookie);
}
//System.out.println("sequenza cookie post:"+cookie);
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("SERVICE_ID", "m_mail"));
params.add(new BasicNameValuePair("LAYOUT", "m"));
params.add(new BasicNameValuePair("DEVICE", ""));
params.add(new BasicNameValuePair("RET_URL","http://m.mailbeta.libero.it/m/wmm/auth/check"));
params.add(new BasicNameValuePair("LOGINID", "secret"));
params.add(new BasicNameValuePair("PASSWORD", "secret"));
UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
System.out.println("stringa urlPost:"+ent.toString());
post.setEntity(ent);
HttpResponse responsePOST = client.execute(post);
System.out.println("Response postForm: " + responsePOST.getStatusLine());
Header[] allHeaders = responsePOST.getAllHeaders();
String location = "";
for (Header header : allHeaders) {
if("location".equalsIgnoreCase(header.getName())) location = header.getValue();
responsePOST.addHeader(header.getName(), header.getValue());
}
cookieVector.clear();
Header[] headerx=responsePOST.getHeaders("Set-Cookie");
System.out.println("array header:"+headerx.length);
for(int i=0;i<headerx.length;i++){
System.out.println("restituito cookie POST:"+headerx[i].getValue());
cookieVector.add(headerx[i]);
//System.out.println("cookie trovato POST:"+cookieVector.elementAt(i));
}
//System.out.println("inseriti"+cookieVector.size()+""+"elements");
//HttpEntity resEntity = responsePOST.getEntity();
// populate redirect information in response
//CONTROLLO ESITO LOGIN
if(location.contains("https://login.libero.it/logincheck.php")){
loginError=1;
}
System.out.println("Redirecting to: " + location);
//EntityUtils.consume(resEntity);
responsePOST.getEntity().consumeContent();
System.out.println("torno a GET:"+"url:"+location+"cookieVector size:"+cookieVector.size());
get(location,"http://login.libero.it/logincheck.php");
} catch (IOException ex) {
Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
}
}
Once logged i'm able to access to the file's link (pdf,image,doc, exc.). In this case we take for example a pdf file:
public static void httpConnection(String url,String referer,String cookieAuth){
try {
String location="";
String cookie="";
HttpResponse response;
HttpGet get;
HttpEntity respEntity;
Referer=referer;
System.out.println("HTTPCONNECTION ################################");
System.out.println("connessione a:"+url+"............");
get = new HttpGet(url);
if(referer.length()>0){
//httpget.setHeader("Referer",referer );
}
if(attachmentURL.size()==0){
get.setHeader("User-Agent", "Chrome/14.0.835.202");
}else{
get.setHeader("Accept-charset", "UTF-8");
get.setHeader("Content-type", "application/pdf");
}
if(cookieVector.size()>0){
System.out.println("iserisco cookie da vector");
for(int i=0;i<cookieVector.size();i++){
cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
}
get.setHeader("Cookie", cookie);
}else if(cookieAuth.length()>0){
System.out.println("inserisco cookieAuth....");
System.out.println("valore cookieSession:"+cookieAuth);
get.setHeader("Cookie",cookieAuth.replace("Set-Cookie:", "")+";");
}
response = client.execute(get);
cookieVector.clear();//reset cookie
System.out.println("home get: " + response.getStatusLine());
Header[] headery=response.getAllHeaders();
for(int j=0;j<headery.length;j++){
System.out.println(headery[j].getName()+" "+" VALUE:"+" "+headery[j].getValue());
}
Header[] headerx=response.getHeaders("Set-Cookie");
System.out.println("array header:"+headerx.length);
System.out.print("httpconnection SERVER HEADERS ###############");
for(int i=0;i<headerx.length;i++){
if("location".equalsIgnoreCase(headerx[i].getName())){
location = headerx[i].getValue();
//ResponseGET.addHeader(headerx[i].getName(), header.getValue());
}
//System.out.println(headerx[i].getValue());
cookieVector.add(headerx[i]);
}
//STREAM CONTENT BODY
HttpEntity entity2=response.getEntity();
InputStream in=entity2.getContent(); <==THIS IS THE WAY I GET STREAM RESPONSE
if(attachmentURL.size()>0){
saveAttachment(in);//SAVE FILE <==
}else{
from(in,htmlpage);//Parse and grab: message title,subject,attachments. If attachments are found then come back here and execute the method saveAttachment.
in.close();
}
} catch (IOException ex) {
Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
}
}
Method httpConnection works and i can download the file!!
Server Response:
Date VALUE: Fri, 18 Nov 2011 13:09:46 GMT
Server VALUE: Apache/2.2.21 (Unix) mod_jk/1.2.23
Set-Cookie VALUE: MST_PVP=tiQZO3nbl9_5f_OQXtJP32YiqQx_5f_kSh6F6Io7r3xS; Domain=m.libero.it; Path=/
Content-Type VALUE: application/octet-stream
Expires VALUE: Fri, 18 Nov 2011 15:09:46 GMT
Transfer-Encoding VALUE: chunked
Example of response body:
%PDF-1.7
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
> endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj
5 0 obj % page content
<<
/Length 44
>>
stream
BT
70 50 TD
/F1 12 Tf
(Hello, world!) Tj
ET
endstream
endobj
xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
Now,let starts from here. Can you,please, tell me what i have to do to save the stream in a file?
########### SOLVED:To save a file locally from the Stream data, respecting the binary data nature, i did like this:
public void saveFile(InputStream is){
try {
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(new File("test.pdf"))));
int c;
while((c = is.read()) != -1) {
out.writeByte(c);
}
out.close();
is.close();
}catch(IOException e) {
System.err.println("Error Writing/Reading Streams.");
}
}
If you want a more efficent method you can use java.IOUtils and do like this:
public void saveFile(InputStream is){
OutputStream os=new FileOutputStream(new File("test.pdf"));
byte[] bytes = IOUtils.toByteArray(is);
os.write(bytes);
os.close();
}
Never store binary data into a String
.
Never use PrintWriter
for binary data.
Never write binary files line by line.
I don't want to be harsh or impolite but these three never's have to take roots in your mind! :)
You can see this page for an example on how to download a binary file. I don't like this example because it caches the whole document in memory (what happens if its size is 5GB?) but you can start from this. :)
Use apache FileUtils. I tried it with a small PDF and a JAR that was 60 meg. Works great!
import java.io.File;
import java.io.IOException;
import java.net.URL;
import org.apache.commons.io.FileUtils;
String uri = "http://localhost:8080/PMInstaller/f1.pdf";
URL url = new URL(uri);
File destination = new File("f1.pdf");
FileUtils.copyURLToFile(url, destination);
can't you just take the link?
public static void downloadFile(URL from, File to, boolean overwrite) throws Exception {
if (to.exists()) {
if (!overwrite)
throw new Exception("File " + to.getAbsolutePath() + " exists already.");
if (!to.delete())
throw new Exception("Cannot delete the file " + to.getAbsolutePath() + ".");
}
int lengthTotal = 0;
try {
HttpURLConnection content = (HttpURLConnection) from.openConnection();
lengthTotal = content.getContentLength();
} catch (Exception e) {
lengthTotal = -1;
}
int lengthSoFar = 0;
InputStream is = from.openStream();
FileOutputStream fos = new FileOutputStream(to);
int lastUpdate = 0;
int c;
while ((c = is.read()) != -1) {
fos.write(c);
}
is.close();
fos.close();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With