Reduce HTML using Applets

Question

My supervisor has tasked me with programmatically reducing a website's content by looking at the HTML tags to reveal only the core content. Importantly, this particular piece of the project must be written in Java.

Now having learnt about the differences betweenPlugins, Extensions, Applets, and Widgets, I think I want to use an Extension that calls a client-side Applet. My approach was going to be this:

Using the Google-Chrome API, I was going to display a button that the user can click.
If clicked, the action is to launch a new browser tab that has the Applet embedded within it.
The applet automatically sources the called tab's HTML code and filters it.
Once filtered, the reduced copy of the original site appears.

So I have a few questions. To start, is it even possible to use an Extension with an Applet? Moreover, is it possible for an applet to look @ another tabs HTML code? If not, is it possible to just reload the original tab with the Applet now embedded within it and complete the function. Thanks.

Paul · Accepted Answer

Javascript is already on most mobile web platforms. Java is not, and there is no reasonable way mobile customers will be able to install Java. Android, which runs many, but not all, mobile devices has a Java run time environment, and is basically a loader for Java apps. But an Apple iPhone is not an Android device... nor is a Windows Phone.

If you want to summarize content on the client, and in Javascript, as I see it you have two choices:

Succeed with some inner burst of genius where dozens of the best expert PhDs in Natural Language Computing have just begun exploring how to extract "true meaning" from text; OR
look at document.title and be done with it.

The 2nd approach assumes that the authors of web pages set titles and set a title appropriate for summarizing their website. This isn't a perfect assumption, but it is OK most of the time. It is also a lot less expensive than #1

With the 1st approach you can get a head start with a "natural language toolkit" that can do things like scan text for unusual words and phrases. To get a rough idea of the kinds of software that have been built in this area, review wikipedia: Outline of natural language processing:: toolkits. A popular tookit for python is called NLTK. Whether you use a toolkit from java, or python, it means working on the server because the client will not have the storage, network speed, or CPU. For python there are server side app frameworks like django or web2py that can make building out a server app faster, and on Java there are servlets frameworks. Ultimately you'll need a lot of help, training, or luck and as I have hinted above it can easily be beyond the capabilities of a small team of fresh hires, and certainly way beyond what a single new developer eager to prove his/her capabilities can do in a few weeks on their own with limited help.

Most web pages have titles set like this near the beginning of the downloaded HTML:

<head><title>My Furry Kittens!</title></head>

You don't need to write a parser. If you are running in the browser, the title has been parsed into the DOM or Document Object Model already. The string "My Furry Kittens!" in this example would be available in the global variable document.title.

If you like, you could put a button into a plugin and let people push it to summarize the website. Or, they could just look up at the title. It is already on the page. Of course, if the goal is to scrape titles one can avoid writing a parser and use a "fake" headless scriptable browser like phantomJS or similar.

You can read more about document.title on the Mozilla Developer Network. MDN is a great reference for learning how web browsers work. They are the maintainers of the Mozilla Firefox browser. Most of what you can learn there will also work on Chrome, Internet Explorer, and various mobile platforms.

Good Luck!

Chomeh · Answer

How about implementing a local proxy server on the mobile device. The browser would just need to be configured to use the proxy, while the custom proxy implementation can transform the requested html however it likes.

Reduce HTML using Applets

Tags:

java

html

google-chrome

applet

Koffy

2 Answers

Paul

Chomeh

Recent Activity

Donate For Us

Reduce HTML using Applets

Tags:

java

html

google-chrome

applet

Koffy

2 Answers

Paul

Chomeh

Related questions

Recent Activity

Donate For Us