Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to search through HTML in a C# string for specific text and mark the text?

Tags:

html

string

c#

What would be the best way to search through HTML inside a C# string variable to find a specific word/phrase and mark (or wrap) that word/phrase with a highlight?

Thanks,

Jeff

like image 429
Yttrium Avatar asked Jan 19 '09 04:01

Yttrium


People also ask

How do you search for text in HTML?

The <input type="search"> defines a text field for entering a search string. Note: Remember to set a name for the search field, otherwise nothing will be submitted. The most common name for search inputs is q.

Why is my HTML code showing up as text?

It is possible that the content is not html type. I made a similar mistake and was wondering why it is showing as text. Check the content of the file, most probably it is not HTML or some html tags must be missing. Make sure that Doc type is explicitly mentioned as HTML type at the begining of the document.


3 Answers

I like using Html Agility Pack very easy to use, although there hasn't been much updates lately, it is still usable. For example grabbing all the links

HtmlWeb client = new HtmlWeb();
HtmlDocument doc = client.Load("http://yoururl.com");            
HtmlNodeCollection Nodes = doc.DocumentNode.SelectNodes("//a[@href]");         

foreach (var link in Nodes)
{                
    Console.WriteLine(link.Attributes["href"].Value);
}
like image 109
Zen Avatar answered Oct 20 '22 18:10

Zen


Regular Expression would be my way. ;)

like image 1
Eddie Parker Avatar answered Oct 20 '22 17:10

Eddie Parker


If the HTML you're using XHTML compliant, you could load it as an XML document, and then use XPath/XSL - long winded but kind of elegant?

An approach I used in the past is to use HTMLTidy to convert messy HTML to XHTML, and then use XSL/XPath for screen scraping content into a database, to create a reverse content management system.

Regular expressions would do it, but could be complicated once you try stripping out tags, image names etc, to remove false positives.

like image 1
MrTelly Avatar answered Oct 20 '22 18:10

MrTelly