Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aspNetHidden div not being served depending on client

I am developing a C# app that gets web pages and processes their contents line by line. To do this, I use the HttpClient class, and read the page contents through ReadAsStreamAsync(). Then I read the stream into a line array and iterate over it. So far so good.

However, the HTML that I obtain with this method is not identical to the HTML that I observe if I navigate to the web page using Chrome or Edge and use View Source to get to the HTML. In particular, the __VIEWSTATE and __VIEWSTATEGENERATOR hidden input elements are surrounded by div elements with class="aspNetHidden" when I use the browser, but not when I get the HTML programmatically. This ruins my line tracking logic as there are extra lines in the page as seen by the browser in relation to the page I am getting in code.

EDIT. After some testing, I am confident that the user agent header employed by the client is what determines whether or not the class="aspNetHidden" div is served. When I mimic my browser's user agent ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 Edg/83.0.478.37"), the div is served; if I use some other agent such as "Test Client", the div is not served.

My question then is, is there any documentation on what user agent strings cause the div to be served and which don't? Also, can I prevent this from happening?

Thanks.

like image 693
CesarGon Avatar asked May 26 '20 15:05

CesarGon


2 Answers

In short, it is not documented/specified in terms of useragents, but browser capabilities.

Based on the browsers useragent a set of capabilities gets set up.
These capabilities are configured in .browser configuration files on the webserver.
For e.g. .NET 4 you find these files in %SystemRoot%\Microsoft.NET\Framework\v4.0.30319\config\browsers,
e.g. chrome.browser, iphone.browser, etc.

Such a .browser file contains a tagwriter capability.
E.g. chrome.browser:

<browsers>
    <!-- Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/530.1 (KHTML, like Gecko) Chrome/2.0.168.0 Safari/530.1 -->
    <browser id="Chrome" parentID="WebKit">
        <identification>
            <userAgent match="Chrome/(?'version'(?'major'\d+)(\.(?'minor'\d+)?)\w*)" />
        </identification>

        <capabilities>
          <capability name="browser"   value="Chrome" />
          <capability name="tagwriter" value="System.Web.UI.HtmlTextWriter" />

          <!-- ... -->  
        </capabilities>
    </browser>
</browsers> 

The tagwriter capability specifies whether a System.Web.UI.HtmlTextWriter or a System.Web.UI.Html32TextWriter will be be instantiated to write the output.

The default configuration in the Default.browser file, declares tagwriter as:

<capability name="tagwriter" value="System.Web.UI.Html32TextWriter" />

Also, if the tagwriter capability is missing a Html32TextWriter is being used.
From the Microsoft reference source:

internal HtmlTextWriter CreateHtmlTextWriterInternal(TextWriter tw) {
    Type tagWriter = TagWriter;
    if (tagWriter != null) {
        return Page.CreateHtmlTextWriterFromType(tw, tagWriter);
    }

    // Fall back to Html 3.2
    return new Html32TextWriter(tw);
}

The Html32TextWriter declares not to render a div around hidden input fields.
From the Microsoft reference source:

internal override bool RenderDivAroundHiddenInputs {
    get {
        return false;
    }
}

The HtmlTextWriter does return true for RenderDivAroundHiddenInputs, see the Microsoft reference source.

Some more reading about all this here.


What you can do.

If you always want the wrapping div, use one of the wellknown useragents, otherwise use a custom one like the Test Client you are already using.
If you control the website being requested, you can set up a custom .browser file for your custom useragent ... but I would rather not go that way ...

When making the request, just set the appropriate User-Agent request header on your HttpClient, e.g.:

var client = new HttpClient();
var userAgent = "Test Client"; // Or "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36 Edg/83.0.478.37"
client.DefaultRequestHeaders.Add("User-Agent", userAgent);
like image 122
pfx Avatar answered Oct 19 '22 10:10

pfx


This can happen for a number of reasons one of most likely ones is the one that @thangadurai mentioned There may be a script which gets executed onload of the html and changes the html content.. This could be avoided by using a UI testing framework such as Selenium or using headless Chrome programmatically.

One of the other possible reasons is the User-Agent dependant implementation. This can be simply solved by changing the User-Agent header.

EDIT: If you control the webpage you could probably disable ViewState if that's the case. The behavior might be based on detecting the User-Agent capabilities. For your processing, you could go with either string and make it static when you send the request, though it might not be as reliable. Another method to the processing without parsing could be using a regular expression to match specific tags. The specifics of the deciding on rendering ViewState were nicely described by @pfx here.

like image 28
fsacer Avatar answered Oct 19 '22 09:10

fsacer