Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect the main article tag like Evernote clipper did

When I tried with Evernote clipper extension, I see a very useful feature. When I clicked at "article", It gives me a really correct main content of page. Let see the result when I used Evernote Clipper with page https://developer.chrome.com/extensions/api_index extract article in a page

I looked at the main article that evernote field out, in several pages, the article is infact extracted from the first article tag. However evernote clipper still work well with pages doesn't use that kind of tag.

I wonder how Evernote clipper can do that ? Is there any js library support to detect the main tag containing the main content of pages. Could you give me some advises to do it.

Thank you in advance!

like image 256
yelliver Avatar asked Jul 21 '14 04:07

yelliver


People also ask

How do I find the web Clipper in Evernote?

Access Web Clipper settings Opera: Right-click on the elephant button in the toolbar, then select Options. Internet Explorer (IE) 7+: In Evernote for Windows Desktop, click Tools > Options > Clipping from the menu bar.

How do I enable Evernote web Clipper in Chrome?

First, visit the Evernote Web Clipper extension page. Click on the blue button marked “Add to Chrome.” Then, should a dialog box appear asking for permission to add “Evernote Web Clipper,” click “Add Extension.” From there, a new browser window will open, taking you to Evernote's “help and learning” page.

Why is my Evernote web Clipper not working?

In Internet Explorer, go to the tools menu (gear button) and click Manage add-ons > Toolbars and Extensions. Make sure 'Add to Evernote 5' is enabled. If it isn't, enable it then restart your browser.

What is Evernote Clipper?

Evernote Web Clipper is a browser extension that lets you save interesting things you find on the web directly to your Evernote account. Web Clipper Capture ideas and inspiration from anywhere with ease. Save articles, web pages, and screenshots directly to Evernote.


1 Answers

From my knowledge, there is no universal js lib to do that. The Evernote clipper uses its own method to extract the "interesting" content from a web page. You can access the code of the Evernote clipper to try to understand the process.

On my mac, the path to the chrome extension is :

~/Library/Application Support/Google/Chrome/Default/Extensions/pioclpoplcdbaefihamjohnefbikjilc/6.2_0/

Here's another tool that works pretty much the same : https://www.readability.com/

You can also check this thread : What algorithm does Readability use for extracting text from URLs?

or search on google for terms like 'content extraction js lib' for example. (Found this one : https://github.com/hatena/extract-content-javascript)

Hope this helps

like image 118
Laurent Sarrazin Avatar answered Sep 28 '22 21:09

Laurent Sarrazin