Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Regex extract content of a div

Tags:

c#

regex

I've seen some related questions of mine, and I tried them but they don't work. I want to match the content from a div with the id "thumbs". But the regex.Success returns false :(

Match regex = Regex.Match(html, @"<div[^>]*id=""thumbs"">(.+?)</div>");
like image 870
Bart Wesselink Avatar asked Jul 04 '13 12:07

Bart Wesselink


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr. Stroustroupe.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

What is C language?

C is an imperative procedural language supporting structured programming, lexical variable scope, and recursion, with a static type system. It was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support.


1 Answers

Regex is not a good choice for parsing HTML files..

HTML is not strict nor is it regular with its format..

Use htmlagilitypack


Why use parser?

Consider your regex..There are infinite number of cases where you could break your code

  • Your regex won't work if there are nested divs
  • Some divs dont have an ending tag!(except XHTML)

You can use this code to retrieve it using HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);

var itemList = doc.DocumentNode.SelectNodes("//div[@id='thumbs']")//this xpath selects all div with thubs id
                  .Select(p => p.InnerText)
                  .ToList();

//itemList now contain all the div tags content having its id as thumbs
like image 141
Anirudha Avatar answered Sep 28 '22 10:09

Anirudha