Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download a large pdf with jsoup

I would like to download a large pdf file with jsoup. I have try to change timeout and maxBodySize but the largest file I could download was about 11MB. I think if there is any way to do something like buffering. Below is my code.

public class Download extends Activity {

static public String nextPage;
static public Response file;
static public Connection.Response res;

@Override
protected void onCreate(Bundle savedInstanceState) {
    // TODO Auto-generated method stub
    super.onCreate(savedInstanceState);
    Bundle b = new Bundle();
    b = getIntent().getExtras();
    nextPage = b.getString("key");
    new Login().execute();
    finish();
}

private class Login extends AsyncTask<Void, Void, Void> {

    @Override
    protected void onPreExecute() {
        super.onPreExecute();
    }

    @Override
    protected Void doInBackground(Void... params) {
        try {
            res = Jsoup.connect("http://www.eclass.teikal.gr/eclass2/")
                    .ignoreContentType(true).userAgent("Mozilla/5.0")
                    .execute();

            SharedPreferences pref = getSharedPreferences(
                    MainActivity.PREFS_NAME, MODE_PRIVATE);
            String username1 = pref.getString(MainActivity.PREF_USERNAME,
                    null);
            String password1 = pref.getString(MainActivity.PREF_PASSWORD,
                    null);
            file = (Response) Jsoup
                    .connect("http://www.eclass.teikal.gr/eclass2/")
                    .ignoreContentType(true).userAgent("Mozilla/5.0")
                    .maxBodySize(1024*1024*10*2)
                    .timeout(70000*10)
                    .cookies(res.cookies()).data("uname", username1)
                    .data("pass", password1).data("next", nextPage)
                    .data("submit", "").method(Method.POST).execute();

        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;

    }

    @Override
    protected void onPostExecute(Void result) {

        String PATH = Environment.getExternalStorageDirectory()
                + "/download/";
        String name = "eclassTest.pdf";
        FileOutputStream out;
        try {

            int len = file.bodyAsBytes().length;
            out = new FileOutputStream(new File(PATH + name));
            out.write(file.bodyAsBytes(),0,len);
            out.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
  }
}

I hope somebody could help me!

like image 994
Falieris Ilias Avatar asked Mar 28 '14 09:03

Falieris Ilias


People also ask

Is jsoup a library?

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

What is jsoup parse?

jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.

What is a jsoup document?

Jsoup is a java html parser. It is a java library that is used to parse HTML document. Jsoup provides api to extract and manipulate data from URL or HTML file. It uses DOM, CSS and Jquery-like methods for extracting and manipulating file.

What is jsoup Android?

Jsoup is a Java html parser. It is a Java library that is used to parse html documents. Jsoup gives programming interface to concentrate and control information from URL or HTML documents. It utilizes DOM, CSS and Jquery-like systems for concentrating and controlling records.


1 Answers

I think, it's better to download any binary file via HTTPConnection:

    InputStream input = null;
    OutputStream output = null;
    HttpURLConnection connection = null;
    try {
        URL url = new URL("http://example.com/file.pdf");
        connection = (HttpURLConnection) url.openConnection();
        connection.connect();

        // expect HTTP 200 OK, so we don't mistakenly save error report
        // instead of the file
        if (connection.getResponseCode() != HttpURLConnection.HTTP_OK) {
            return "Server returned HTTP " + connection.getResponseCode()
                    + " " + connection.getResponseMessage();
        }

        // this will be useful to display download percentage
        // might be -1: server did not report the length
        int fileLength = connection.getContentLength();

        // download the file
        input = connection.getInputStream();
        output = new FileOutputStream("/sdcard/file_name.extension");

        byte data[] = new byte[4096];
        int count;
        while ((count = input.read(data)) != -1) {
            output.write(data, 0, count);
        }
    } catch (Exception e) {
        return e.toString();
    } finally {
        try {
            if (output != null)
                output.close();
            if (input != null)
                input.close();
        } catch (IOException ignored) {
        }

        if (connection != null)
            connection.disconnect();
    }

Jsoup is for parsing and loading HTML pages, not binary files.

like image 93
Alex Saskevich Avatar answered Oct 09 '22 02:10

Alex Saskevich