Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android: retrieve html of website certain time after request

My goal is to retrieve the html of a website in a readable String (which I have done), and to modify the code slightly so that the html is retrieved a certain time after the Get command is made.

Here's an example of what I'm trying to do: on the website http://time.gov/HTML5/, the html that appears right when the page loads is not the full html; after a few seconds, javascript commands execute that slightly modify the html. My goal is to get the modified html.

Here is what I have done to get the website html:

public class MainActivity extends Activity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        DownloadTask task = new DownloadTask();
        task.execute("http://time.gov/HTML5/");

    }

    private class DownloadTask extends AsyncTask<String, Void, String>{

        @Override
        protected String doInBackground(String... urls) {
            HttpResponse response = null;
            HttpGet httpGet = null;
            HttpClient mHttpClient = null;
            String s = "";

            try {
                if(mHttpClient == null){
                    mHttpClient = new DefaultHttpClient();
                }


                httpGet = new HttpGet(urls[0]);


                response = mHttpClient.execute(httpGet);
                s = EntityUtils.toString(response.getEntity(), "UTF-8");


            } catch (IOException e) {
                e.printStackTrace();
            } 
            return s;
        }

        @Override
        protected void onPostExecute(String result){
            final TextView textview1 = (TextView) findViewById(R.id.headline);
            textview1.setText(result);

        }
    }
}

This code correctly gets the unmodified html. However, I am trying to get the html a couple seconds after the request is made (which will hopefully give it enough time to update the html) by using Thread.sleep(5000), but this is not working. Does anyone know how to approach this problem?

like image 681
user3866661 Avatar asked Aug 02 '14 19:08

user3866661


2 Answers

What I understand from your question is, you need to fetch the HTML of a web page after the page is completely loaded (After running all the scripts inside the page).

AFAIK, you cannot achieve this with your current implementation. Once you call the function HttpClient.execute() you cannot apply any delay within that function, it'll just fetch the currently available data. And you cannot use a Handler either. It'll only help you to delay the 'execute()' call.

Unfortunately we cannot set any listeners to the client which will provide a callback whenever the data in the webpage changes (At least I'm not aware of any such functionality).

But you can achieve this using a completely different yet painless method. This is how you can implement it.

  1. Place a WebView in your activity, keep it hidden
  2. Load the web page in the WebView
  3. Hook the onPageFinished() of your WebViewClient implementation and From there inject the html content of the WebView in to your JavaScriptInterface implementation.

The WebView:

In your layout XML

<WebView
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:id="@+id/my_web"
    android:visibility="gone"/>

In your Activity onCreate()

TextView textview1;

public void onCreate(Bundle savedInstanceState) {

    /* Your code here */

    textview1 = (TextView) findViewById(R.id.TextView1);

    WebView web = (WebView) view.findViewById(R.id.my_web);
    web.getSettings().setJavaScriptEnabled(true);
    web.addJavascriptInterface(new CustomJavaScriptInterface(), "JavaScriptInterface");
    web.setWebViewClient(new CustomWebViewClient());
    web.loadUrl("http://time.gov/HTML5/");

    /* Your code here */
}

WebViewClient

private class CustomWebViewClient extends WebViewClient {
    @Override
    public void onPageFinished(WebView view, String url) {
        //Inject the HTML in to the JavaScriptInterface
        view.loadUrl("javascript:window.JavaScriptInterface.html('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
    }
}

JavaScritpInterface

private class CustomJavaScriptInterface {

    @JavascriptInterface
    public void html(final String html) {
        //Your HTML is here
        runOnUiThread(new Runnable() {
            @Override
            public void run() {
                setTextHtml(html);
            }
        });
        Log.e("HTML Length", Integer.toString(html.length()));
    }
}

private void setTextHtml(String html) {
    textview1.setText(html);
}

Conclusion:

To verify this, I put the line Log.e("HTML Length", Integer.toString(html.length())); in your AsyncTask's postExecute() and this is what I got logged.

08-05 14:29:59.886 13332-13332/com.sample.fetchhtml E/HTML Length﹕ 10438

At the same time the log written from the function html() of JavaScriptInterface is

08-05 14:30:09.021 13332-13420/com.sample.fetchhtml E/HTML Length﹕ 22498

You can see the difference in size of the HTML string I got in both cases. Hope this helps.

Update (07 Aug): The delay in execution depends on the time taken by the webpage to get loaded completely in the webview. This approach is suitable for webpages contains startup scripts. For a static webpage it's better to use HttpClient.execute().

like image 183
gnuanu Avatar answered Oct 06 '22 00:10

gnuanu


You don't want to do long sleeps on an AsyncTask, because it will hold up any other AsyncTask. I would set a timer for 5 seconds and launch a second AsyncTask instance to do the second read.

like image 43
Gabe Sechan Avatar answered Oct 05 '22 22:10

Gabe Sechan