Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# version of HTML Tidy?

I am just looking for a really easy way to clean up some HTML (possibly with embedded JavaScript code). I tried two different HTML Tidy .NET ports and both are throwing exceptions...

Sorry, by "clean" I mean "indent". The HTML is not malformed, at all. It's XHTML strict.


I finally got something working with SGML, but this is seriously the most ridiculous chunk of code ever to indent some HTML.

private static string FormatHtml(string input)
{
    var sgml = new SgmlReader {DocType = "HTML", InputStream = new StringReader(input)};
    using (var sw = new StringWriter())
    using (var xw = new XmlTextWriter(sw) { Indentation = 2, Formatting = Formatting.Indented })
    {
        sgml.Read();
        while (!sgml.EOF)
            xw.WriteNode(sgml, true);
    }
    return sw.ToString();
}
like image 839
mpen Avatar asked Oct 23 '10 03:10

mpen


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr. Stroustroupe.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

How old is the letter C?

The letter c was applied by French orthographists in the 12th century to represent the sound ts in English, and this sound developed into the simpler sibilant s.


1 Answers

AngleSharp 100% c#

    var parser = new HtmlParser();
    
    var document = parser.Parse("<html><head></head><body><i></i></body></html>");

    var sw = new StringWriter();
    document.ToHtml(sw, new PrettyMarkupFormatter());

    var HTML_prettified = sw.ToString();

edit by sebastian :

 //old parse method
 var document = parser.Parse("<html><head></head><body><i></i></body></html>");

 //new parse method  -Updated version (Nuget Package AngleSharp 0.16.1): 
 var document = await parser.ParseDocumentAsync(Code); 
 
like image 147
bh_earth0 Avatar answered Sep 27 '22 18:09

bh_earth0