Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JAVA - Download Binary File (e.g. PDF) file from Webserver

I need to download a pdf file from a webserver to my pc and save it locally.

I used Httpclient to connect to webserver and get the content body:

HttpEntity entity=response.getEntity();
                InputStream in=entity.getContent();

                String stream = CharStreams.toString(new InputStreamReader(in));
                int size=stream.length();
                System.out.println("stringa html page LENGTH:"+stream.length());
                 System.out.println(stream);
                 SaveToFile(stream);

Then i save content in a file:

                              //check CRLF (i don't know if i need to to this)
                                   String[] fix=stream.split("\r\n");

                                      File file=new              File("C:\\Users\\augusto\\Desktop\\progetti web\\test\\test2.pdf");
                                      PrintWriter out = new PrintWriter(new FileWriter(file));
                                      for (int i = 0; i < fix.length; i++)  {
                                          out.print(fix[i]);
                                         out.print("\n");

                                      }
                                     out.close();

I also tried to save a String content to file directly:

                         OutputStream out=new FileOutputStream("pathPdfFile");
                         out.write(stream.getBytes());
                         out.close();

But the result is always the same: I can open pdf file but i can see white pages only. Does the mistake is around pdf stream and endstream charset encoding? Does pdf content between stream and endStream need to be manipulate in some others way?


Hope this helps to avoid some misunderstanding about what i want to do:

This is my login (works perfectly):

  public static void postForm(){
    String cookie="";
    try {
   System.out.println("POSTFORM ###################################");
     String postURL = "http://login.libero.it/logincheck.php";
    HttpPost post = new HttpPost(postURL);
        post.setHeader("User-Agent", "Chrome/14.0.835.202");
        post.setHeader("Referer","http://login.libero.it/?layout=m&service_id=m_mail&ret_url=http://m.mailbeta.libero.it/m/wmm/auth/check");
        if(cookieVector.size()>0){
           for(int i=0;i<cookieVector.size();i++){
              cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";

             }
              post.setHeader("Cookie",cookie);

        }
        //System.out.println("sequenza cookie post:"+cookie);
        List<NameValuePair> params = new ArrayList<NameValuePair>();
        params.add(new BasicNameValuePair("SERVICE_ID", "m_mail"));
        params.add(new BasicNameValuePair("LAYOUT", "m"));
        params.add(new BasicNameValuePair("DEVICE", ""));
        params.add(new  BasicNameValuePair("RET_URL","http://m.mailbeta.libero.it/m/wmm/auth/check"));
        params.add(new BasicNameValuePair("LOGINID", "secret"));
        params.add(new BasicNameValuePair("PASSWORD", "secret"));
        UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
        System.out.println("stringa urlPost:"+ent.toString());
        post.setEntity(ent);
        HttpResponse responsePOST = client.execute(post);
                System.out.println("Response postForm: " +              responsePOST.getStatusLine());
        Header[] allHeaders = responsePOST.getAllHeaders();

    String location = "";
    for (Header header : allHeaders) {
        if("location".equalsIgnoreCase(header.getName())) location = header.getValue();
        responsePOST.addHeader(header.getName(), header.getValue());
    }
    cookieVector.clear();
    Header[] headerx=responsePOST.getHeaders("Set-Cookie");
    System.out.println("array header:"+headerx.length);
        for(int i=0;i<headerx.length;i++){
             System.out.println("restituito cookie POST:"+headerx[i].getValue());
           cookieVector.add(headerx[i]);
           //System.out.println("cookie trovato POST:"+cookieVector.elementAt(i));
        }
        //System.out.println("inseriti"+cookieVector.size()+""+"elements");
        //HttpEntity resEntity = responsePOST.getEntity();

        // populate redirect information in response
         //CONTROLLO ESITO LOGIN
                     if(location.contains("https://login.libero.it/logincheck.php")){
                          loginError=1;
                     }
                 System.out.println("Redirecting to: " + location);
                 //EntityUtils.consume(resEntity);
                                 responsePOST.getEntity().consumeContent();
                 System.out.println("torno a GET:"+"url:"+location+"cookieVector size:"+cookieVector.size());
                 get(location,"http://login.libero.it/logincheck.php");




    }  catch (IOException ex) {
        Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
    }

}

Once logged i'm able to access to the file's link (pdf,image,doc, exc.). In this case we take for example a pdf file:

    public static void httpConnection(String url,String referer,String cookieAuth){
    try {
        String location="";
        String cookie="";
        HttpResponse response;
        HttpGet get;
        HttpEntity respEntity;
        Referer=referer;
        System.out.println("HTTPCONNECTION ################################");
        System.out.println("connessione a:"+url+"............");

        get = new HttpGet(url);
        if(referer.length()>0){
        //httpget.setHeader("Referer",referer );

        }
           if(attachmentURL.size()==0){
            get.setHeader("User-Agent", "Chrome/14.0.835.202");
           }else{

           get.setHeader("Accept-charset", "UTF-8");

             get.setHeader("Content-type", "application/pdf");
           }
        if(cookieVector.size()>0){
            System.out.println("iserisco cookie da vector");
         for(int i=0;i<cookieVector.size();i++){
           cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
          }
         get.setHeader("Cookie", cookie);
        }else if(cookieAuth.length()>0){
            System.out.println("inserisco cookieAuth....");
            System.out.println("valore cookieSession:"+cookieAuth);
            get.setHeader("Cookie",cookieAuth.replace("Set-Cookie:", "")+";");
        }

        response = client.execute(get);
        cookieVector.clear();//reset cookie


        System.out.println("home get: " + response.getStatusLine());


        Header[] headery=response.getAllHeaders();
         for(int j=0;j<headery.length;j++){
                            System.out.println(headery[j].getName()+" "+" VALUE:"+" "+headery[j].getValue());
         }
        Header[] headerx=response.getHeaders("Set-Cookie");
        System.out.println("array header:"+headerx.length);
          System.out.print("httpconnection SERVER HEADERS ###############");
        for(int i=0;i<headerx.length;i++){
             if("location".equalsIgnoreCase(headerx[i].getName())){
                 location = headerx[i].getValue();
                  //ResponseGET.addHeader(headerx[i].getName(), header.getValue());
             }

        //System.out.println(headerx[i].getValue());
        cookieVector.add(headerx[i]);
        }


              //STREAM CONTENT BODY

                HttpEntity entity2=response.getEntity();
                InputStream in=entity2.getContent(); <==THIS IS THE WAY I GET STREAM RESPONSE


               if(attachmentURL.size()>0){
                   saveAttachment(in);//SAVE FILE <==
               }else{
                from(in,htmlpage);//Parse and grab: message title,subject,attachments. If attachments are found then come back here and execute the method saveAttachment.
                in.close();
               }

    } catch (IOException ex) {
        Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
    }

}

Method httpConnection works and i can download the file!!

Server Response:

 Date  VALUE: Fri, 18 Nov 2011 13:09:46 GMT
 Server  VALUE: Apache/2.2.21 (Unix) mod_jk/1.2.23
  Set-Cookie  VALUE: MST_PVP=tiQZO3nbl9_5f_OQXtJP32YiqQx_5f_kSh6F6Io7r3xS;       Domain=m.libero.it; Path=/
  Content-Type  VALUE: application/octet-stream
  Expires  VALUE: Fri, 18 Nov 2011 15:09:46 GMT
  Transfer-Encoding  VALUE: chunked

Example of response body:

 %PDF-1.7

 1 0 obj  % entry point
 <<
/Type /Catalog
/Pages 2 0 R

> endobj

 2 0 obj
 <<
 /Type /Pages
 /MediaBox [ 0 0 200 200 ]
 /Count 1
 /Kids [ 3 0 R ]
 >>
  endobj

  3 0 obj
  <<
 /Type /Page
 /Parent 2 0 R
 /Resources <<
  /Font <<
  /F1 4 0 R 
>>
>>
/Contents 5 0 R
>>
endobj

4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj

5 0 obj  % page content
<<
 /Length 44
 >>
 stream
  BT
  70 50 TD
 /F1 12 Tf
 (Hello, world!) Tj
  ET
  endstream
  endobj

  xref
  0 6
 0000000000 65535 f 
 0000000010 00000 n 
 0000000079 00000 n 
 0000000173 00000 n 
 0000000301 00000 n 
0000000380 00000 n 
trailer
<<
/Size 6
/Root 1 0 R
 >>
 startxref
 492
 %%EOF

Now,let starts from here. Can you,please, tell me what i have to do to save the stream in a file?

########### SOLVED:

To save a file locally from the Stream data, respecting the binary data nature, i did like this:

  public void saveFile(InputStream is){

   try {
        DataOutputStream out = new DataOutputStream(new  BufferedOutputStream(new FileOutputStream(new File("test.pdf"))));
        int c;
        while((c = is.read()) != -1) {
            out.writeByte(c);
        }
        out.close();
                    is.close();
    }catch(IOException e) {
        System.err.println("Error Writing/Reading Streams.");
    }
     }

If you want a more efficent method you can use java.IOUtils and do like this:

   public void saveFile(InputStream is){

      OutputStream os=new FileOutputStream(new File("test.pdf"));        
      byte[] bytes = IOUtils.toByteArray(is);
      os.write(bytes);
      os.close();

    }
like image 530
Augusto Picciani Avatar asked Nov 17 '11 17:11

Augusto Picciani


3 Answers

Never store binary data into a String.

Never use PrintWriter for binary data.

Never write binary files line by line.

I don't want to be harsh or impolite but these three never's have to take roots in your mind! :)

You can see this page for an example on how to download a binary file. I don't like this example because it caches the whole document in memory (what happens if its size is 5GB?) but you can start from this. :)

like image 113
gd1 Avatar answered Nov 27 '22 09:11

gd1


Use apache FileUtils. I tried it with a small PDF and a JAR that was 60 meg. Works great!

import java.io.File;
import java.io.IOException;
import java.net.URL;
import org.apache.commons.io.FileUtils;

String uri = "http://localhost:8080/PMInstaller/f1.pdf";
URL url = new URL(uri);
File destination = new File("f1.pdf");
FileUtils.copyURLToFile(url, destination);
like image 45
Gary Eberhart Avatar answered Nov 27 '22 07:11

Gary Eberhart


can't you just take the link?

public static void downloadFile(URL from, File to, boolean overwrite) throws Exception {
    if (to.exists()) {
        if (!overwrite)
            throw new Exception("File " + to.getAbsolutePath() + " exists already.");
        if (!to.delete())
            throw new Exception("Cannot delete the file " + to.getAbsolutePath() + ".");
    }

    int lengthTotal = 0;
    try {
        HttpURLConnection content = (HttpURLConnection) from.openConnection();
        lengthTotal = content.getContentLength();
    } catch (Exception e) {
        lengthTotal = -1;
    }

    int lengthSoFar = 0;
    InputStream is = from.openStream();
    FileOutputStream fos = new FileOutputStream(to);

    int lastUpdate = 0;
    int c;
    while ((c = is.read()) != -1) {
        fos.write(c);
    }

    is.close();
    fos.close();
}
like image 45
hurtledown Avatar answered Nov 27 '22 07:11

hurtledown