Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validate HTML5 in C#

Tags:

html

c#

selenium

We are currently building a green-fields app in C#. We have extensive UI tests which use Selenium Web Driver. These tests ( as well as unit tests ) are run by our CI server.

Selenium exposes a .PageSource attribute, and it makes sense (to me) to run that source through a HTML5 validator as another part each UI test.

I want to pick up on the same sorts of things that http://validator.w3.org/ picks up on. As a bonus, I would also like to pick up on a 508 issues.

My problem is that I can't find anything that will do this locally and is easy to integrate into my UI tests.. The W3C site exposes a SOAP api, however I don't want to hit their site as part of the CI process. They also don't appear to support getting SOAP responses back. I would like to avoid installing a full W3C server locally.

The closest thing that I see is this http://www.totalvalidator.com/, using it would require writing temp files and parsing reports.

I thought I'd see if anyone knows of another way before I go down this track. Preferably a DotNet assembly that I can call.

c

like image 429
Dave Avatar asked Oct 05 '13 05:10

Dave


People also ask

How do I validate my HTML code?

In order to validate your code, you have to declare the standard to which it adheres. To describe the HTML standard (the document type declaration, DTD), the file should contain a DOCTYPE declaration (before the HTML code). Here are a few examples (from http://www.htmlhelp.com/tools/validator/doctype.html).

What can you use to validate an HTML5 file?

The most common three validators you'll use are: Validator.nu: A new-school validator that validates HTML5, ARIA, SVG 1.1 and MathML 2.0: it goes through the entire document pointing out places where your markup doesn't follow that doctype correctly (ie where there are errors).

What is HTML5 validation?

Also, HTML5 validation helps users inputting data by providing specific controls, such as date pickers and custom on-screen keyboards. HTML5 input types are displayed as simple text input fields in older web browsers that do not support these HTML5 features. The example below shows these HTML5 input types in action.

Does HTML5 have form validation?

HTML5 also makes it easier to control styling with CSS. In addition, it also provides built-in validation features through the use of special attributes and new input types.


2 Answers

After spending an entire weekend on this problem, the only solution I can see is a commercial library called CSE HTML Validator

It is located here http://www.htmlvalidator.com/htmldownload.html

I wrote a simple wrapper for it. Here is the code

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;

[assembly: CLSCompliant(true)]
namespace HtmlValidator
{

public class Validator
{
    #region Constructors...

    public Validator(string htmlToValidate)
    {
        HtmlToValidate = htmlToValidate;
        HasExecuted = false;
        Errors = new List<ValidationResult>();
        Warnings = new List<ValidationResult>();
        OtherMessages = new List<ValidationResult>();

    }

    #endregion



    #region Properties...
    public IList<ValidationResult> Errors { get; private set; }
    public bool HasExecuted { get; private set; }
    public string HtmlToValidate { get; private set; }
    public IList<ValidationResult> OtherMessages { get; private set; }
    public string ResultsString { get; private set; }
    public string TempFilePath { get; private set; }
    public IList<ValidationResult> Warnings { get; private set; }
    #endregion



    #region Public methods...
    public void ValidateHtmlFile()
    {

        WriteTempFile();

        ExecuteValidator();

        DeleteTempFile();

        ParseResults();

        HasExecuted = true;
    }

    #endregion



    #region Private methods...
    private void DeleteTempFile()
    {
        TempFilePath = Path.GetTempFileName();
        File.Delete(TempFilePath);
    }


    private void ExecuteValidator()
    {
        var psi = new ProcessStartInfo(GetHTMLValidatorPath())
        {
            RedirectStandardInput = false,
            RedirectStandardOutput = true,
            RedirectStandardError = false,
            UseShellExecute = false,
            Arguments = String.Format(@"-e,(stdout),0,16 ""{0}""", TempFilePath)
        };

        var p = new Process
        {
            StartInfo = psi
        };
        p.Start();

        var stdOut = p.StandardOutput;

        ResultsString = stdOut.ReadToEnd();
    }


    private static string GetHTMLValidatorPath()
    {
        return @"C:\Program Files (x86)\HTMLValidator120\cmdlineprocessor.exe";
    }


    private void ParseResults()
    {
        var results = JsonConvert.DeserializeObject<dynamic>(ResultsString);
        IList<InternalValidationResult> messages = results.messages.ToObject<List<InternalValidationResult>>();


        foreach (InternalValidationResult internalValidationResult in messages)
        {
            ValidationResult result = new ValidationResult()
            {
                Message = internalValidationResult.message,
                LineNumber = internalValidationResult.linenumber,
                MessageCategory = internalValidationResult.messagecategory,
                MessageType = internalValidationResult.messagetype,
                CharLocation = internalValidationResult.charlocation
            };

            switch (internalValidationResult.messagetype)
            {
                case "ERROR":
                    Errors.Add(result);
                    break;

                case "WARNING":
                    Warnings.Add(result);
                    break;

                default:
                    OtherMessages.Add(result);
                    break;
            }
        }
    }


    private void WriteTempFile()
    {
        TempFilePath = Path.GetTempFileName();
        StreamWriter streamWriter = File.AppendText(TempFilePath);
        streamWriter.WriteLine(HtmlToValidate);
        streamWriter.Flush();
        streamWriter.Close();
    }
    #endregion
}
}




public class ValidationResult
{
    public string MessageType { get; set; }
    public string MessageCategory { get; set; }
    public string Message { get; set; }
    public int LineNumber { get; set; }
    public int CharLocation { get; set; }


    public override string ToString()
    {
        return String.Format("{0} Line {1} Char {2}:: {3}", this.MessageType, this.LineNumber, this.CharLocation, this.Message);

    }

}


public class InternalValidationResult
{
    /*
     * DA: this class is used as in intermediate store of messages that come back from the underlying validator. The fields must be cased as per the underlying Json object.
     * That is why they are ignored.
     */
    #region Properties...
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "charlocation"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly", MessageId = "charlocation")]
    public int charlocation { get; set; }
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "linenumber"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly", MessageId = "linenumber")]

    public int linenumber { get; set; }
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "message"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly", MessageId = "message")]

    public string message { get; set; }
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly", MessageId = "messagecategory"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "messagecategory")]
    public string messagecategory { get; set; }
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "messagetype"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly", MessageId = "messagetype")]

    public string messagetype { get; set; }
    #endregion
}

Usage/Testing

   private const string ValidHtml = "<!DOCType html><html><head></head><body><p>Hello World</p></body></html>";
    private const string BrokenHtml = "<!DOCType html><html><head></head><body><p>Hello World</p></body>";

    [TestMethod]
    public void CanValidHtmlStringReturnNoErrors()
    {
        Validator subject = new Validator(ValidHtml);
        subject.ValidateHtmlFile();
        Assert.IsTrue(subject.HasExecuted);
        Assert.IsTrue(subject.Errors.Count == 0);
    }


    [TestMethod]
    public void CanInvalidHtmlStringReturnErrors()
    {
        Validator subject = new Validator(BrokenHtml);
        subject.ValidateHtmlFile();
        Assert.IsTrue(subject.HasExecuted);
        Assert.IsTrue(subject.Errors.Count > 0);
        Assert.IsTrue(subject.Errors[0].ToString().Contains("ERROR"));
    }
like image 188
Dave Avatar answered Oct 20 '22 08:10

Dave


The best HTML5 validator, the nu checker, is in Java and hard to interface with from .NET. But libtidy can be written into a C++ dll to be called from managed code. The sample program they've posted did a good job for me, with a little adapting.

LibTidy.h:

public ref class LibTidy
{
public:
    System::String^ __clrcall Test(System::String^ input);
};

LibTidy.cpp:

System::String^ __clrcall LibTidy::Test(System::String^ input)
{
    CStringW cstring(input);
  
    const size_t newsizew = (cstring.GetLength() + 1) * 2;
    char* nstringw = new char[newsizew];
    size_t convertedCharsw = 0;
    wcstombs_s(&convertedCharsw, nstringw, newsizew, cstring, _TRUNCATE);

        TidyBuffer errbuf = { 0 };
        int rc = -1;
        Bool ok;

        TidyDoc tdoc = tidyCreate();                     // Initialize "document"
                
        ok = tidyOptSetBool(tdoc, TidyShowInfo, no);
        ok = tidyOptSetBool(tdoc, TidyQuiet, yes);
        ok = tidyOptSetBool(tdoc, TidyEmacs, yes);
        if (ok)
            rc = tidySetErrorBuffer(tdoc, &errbuf);      // Capture diagnostics
        if (rc >= 0)
            rc = tidyParseString(tdoc, nstringw);           // Parse the input
        if (rc >= 0)
            rc = tidyCleanAndRepair(tdoc);               // Tidy it up!
        if (rc >= 0)
            rc = tidyRunDiagnostics(tdoc);               // Kvetch
        char* outputBytes = (char*)errbuf.bp;
    
        if (errbuf.allocator != NULL) tidyBufFree(&errbuf);
        tidyRelease(tdoc);

        return gcnew System::String(outputBytes);
    }
like image 1
andruo11 Avatar answered Oct 20 '22 08:10

andruo11