My goal is to retrieve the html
of a website in a readable String
(which I have done), and to modify the code slightly so that the html
is retrieved a certain time after the Get
command is made.
Here's an example of what I'm trying to do: on the website http://time.gov/HTML5/, the html
that appears right when the page loads is not the full html
; after a few seconds, javascript
commands execute that slightly modify the html
. My goal is to get the modified html
.
Here is what I have done to get the website html
:
public class MainActivity extends Activity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
DownloadTask task = new DownloadTask();
task.execute("http://time.gov/HTML5/");
}
private class DownloadTask extends AsyncTask<String, Void, String>{
@Override
protected String doInBackground(String... urls) {
HttpResponse response = null;
HttpGet httpGet = null;
HttpClient mHttpClient = null;
String s = "";
try {
if(mHttpClient == null){
mHttpClient = new DefaultHttpClient();
}
httpGet = new HttpGet(urls[0]);
response = mHttpClient.execute(httpGet);
s = EntityUtils.toString(response.getEntity(), "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
return s;
}
@Override
protected void onPostExecute(String result){
final TextView textview1 = (TextView) findViewById(R.id.headline);
textview1.setText(result);
}
}
}
This code correctly gets the unmodified html
. However, I am trying to get the html
a couple seconds after the request is made (which will hopefully give it enough time to update the html
) by using Thread.sleep(5000)
, but this is not working. Does anyone know how to approach this problem?
What I understand from your question is, you need to fetch the HTML of a web page after the page is completely loaded (After running all the scripts inside the page).
AFAIK, you cannot achieve this with your current implementation. Once you call the function HttpClient.execute()
you cannot apply any delay within that function, it'll just fetch the currently available data. And you cannot use a Handler
either. It'll only help you to delay the 'execute()' call.
Unfortunately we cannot set any listeners to the client which will provide a callback whenever the data in the webpage changes (At least I'm not aware of any such functionality).
But you can achieve this using a completely different yet painless method. This is how you can implement it.
WebView
in your activity, keep it hiddenonPageFinished()
of your WebViewClient
implementation and From there inject the html content of the WebView in to your JavaScriptInterface
implementation.The WebView:
In your layout XML
<WebView
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:id="@+id/my_web"
android:visibility="gone"/>
In your Activity onCreate()
TextView textview1;
public void onCreate(Bundle savedInstanceState) {
/* Your code here */
textview1 = (TextView) findViewById(R.id.TextView1);
WebView web = (WebView) view.findViewById(R.id.my_web);
web.getSettings().setJavaScriptEnabled(true);
web.addJavascriptInterface(new CustomJavaScriptInterface(), "JavaScriptInterface");
web.setWebViewClient(new CustomWebViewClient());
web.loadUrl("http://time.gov/HTML5/");
/* Your code here */
}
WebViewClient
private class CustomWebViewClient extends WebViewClient {
@Override
public void onPageFinished(WebView view, String url) {
//Inject the HTML in to the JavaScriptInterface
view.loadUrl("javascript:window.JavaScriptInterface.html('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
}
}
JavaScritpInterface
private class CustomJavaScriptInterface {
@JavascriptInterface
public void html(final String html) {
//Your HTML is here
runOnUiThread(new Runnable() {
@Override
public void run() {
setTextHtml(html);
}
});
Log.e("HTML Length", Integer.toString(html.length()));
}
}
private void setTextHtml(String html) {
textview1.setText(html);
}
Conclusion:
To verify this, I put the line Log.e("HTML Length", Integer.toString(html.length()));
in your AsyncTask's postExecute()
and this is what I got logged.
08-05 14:29:59.886 13332-13332/com.sample.fetchhtml E/HTML Length﹕ 10438
At the same time the log written from the function html()
of JavaScriptInterface
is
08-05 14:30:09.021 13332-13420/com.sample.fetchhtml E/HTML Length﹕ 22498
You can see the difference in size of the HTML string I got in both cases. Hope this helps.
Update (07 Aug): The delay in execution depends on the time taken by the webpage to get loaded completely in the webview. This approach is suitable for webpages contains startup scripts. For a static webpage it's better to use HttpClient.execute()
.
You don't want to do long sleeps on an AsyncTask, because it will hold up any other AsyncTask. I would set a timer for 5 seconds and launch a second AsyncTask instance to do the second read.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With