Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: Extract all links with a certain word in them with JSoup?

Might be an unclear question so here's the code and explanation:

    Document doc = Jsoup.parse(exampleHtmlData);

    Elements certainLinks = doc.select("a[href=google.com/example/]");

The String exampleHtmlData contains a parsed HTML source from a certain site. This site has a lot of links which direct the user to google. A few examples would be:

http://google.com/example/hello 
http://google.com/example/certaindir/anotherdir/something
http://google.com/anotherexample

I want to extract all the links that contain google.com/example/ in the link with the doc.select function. How do I do this with JSoup?

like image 437
ZimZim Avatar asked Jun 10 '12 20:06

ZimZim


People also ask

Can jsoup parse JavaScript?

Jsoup parses the source code as delivered from the server (or in this case loaded from file). It does not invoke client-side actions such as JavaScript or CSS DOM manipulation.

What does jsoup parse do?

What It Is. jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.

What is Dom in jsoup?

Where. document − document object represents the HTML DOM. Jsoup − main class to parse the given HTML String. html − HTML String. sampleDiv − Element object represent the html node element identified by id "sampleDiv".


1 Answers

You can refer the SelectorSyntax.

Document doc = Jsoup.parse(exampleHtmlData);
Elements certainLinks = doc.select("a[href*=google.com/example/]");
like image 66
Akhi Avatar answered Oct 23 '22 10:10

Akhi