Last night a customer called, frantic, because Google had cached versions of private employee information. The information is not available unless you login.
They had done a Google search for their domain, e.g.:
site:example.com
and noticed that Googled had crawled, and cached, some internal pages.
Looking at the cached versions of the pages myself:
This is Google's cache of https://example.com/(F(NSvQJ0SS3gYRJB4UUcDa1z7JWp7Qy7Kb76XGu8riAA1idys-nfR1mid8Qw7sZH0DYcL64GGiB6FK_TLBy3yr0KnARauyjjDL3Wdf1QcS-ivVwWrq-htW_qIeViQlz6CHtm0faD8qVOmAzdArbgngDfMMSg_N4u45UysZxTnL3d6mCX7pe2Ezj0F21g4w9VP57ZlXQ_6Rf-HhK8kMBxEdtlrEm2gBwBhOCcf_f71GdkI1))/ViewTransaction.aspx?transactionNumber=12345. It is a snapshot of the page as it appeared on 15 Sep 2013 00:07:22 GMT
I was confused by the long url. Rather than:
https://example.com/ViewTransaction.aspx?transactionNumber=12345
there was a long string inserted:
https://example.com/[...snip...]/ViewTransaction.aspx?transactionNumber=12345
It took me a few minutes to remember: that might be a symptom of ASP.net's "cookie-less sessions". If your browser does not support Set-Cookie, the web-site will embed a cookie in the URL.
Except our site doesn't use that.
And even if our site did have cookie-less sessions auto-detected, and Google managed to cajole the web-server into handing it a session in the url, how did it take over another user's session?
The site has been crawled by bots for years. And this past May 29 was no different.
Google usually starts its crawl by checking the robots.txt
file (we don't have one). But nobody is allowed to ready anything on the site (including robots.txt
) without first being authenticated, so it fails:
Time Uri Port User Name Status
======== ======================= ==== ================ ======
1:33:04 GET /robots.txt 80 302 ;not authenticated, see /Account/Login.aspx
1:33:04 GET /Account/Login.aspx 80 302 ;use https plesae
1:33:04 GET /Account/Login.aspx 443 200 ;go ahead, try to login
All that time Google was looking for a robots.txt file. It never got one. Then it returns to try to crawl the root:
Time Uri Port User Name Status
======== ======================= ==== ================ ======
1:33:04 GET / 80 302 ;not authenticated, see /Account/Login.aspx
1:33:04 GET /Account/Login.aspx 80 302 ;use https plesae
1:33:04 GET /Account/Login.aspx 443 200 ;go ahead, try to login
And another check of robots.txt on the secure site:
Time Uri Port User Name Status
======== ======================= ==== ================ ======
1:33:04 GET /robots.txt 443 302 ;not authenticated, see /Account/Login.aspx
1:33:04 GET /Account/Login.aspx 443 200 ;go ahead, try to login
And then the stylesheet on the login page:
Time Uri Port User Name Status
======== ======================= ==== ================ ======
1:33:04 GET /Styles/Site.css 443 200
And that's how every crawl from GoogleBot, msnbot, and BingBot works. Robots, login, secure, login. Never getting anywhere, because it cannot get past WebForms Authentication. And all is well with the world.
Until one day, GoogleBot shows up, with a Session cookie in hand!
Time Uri Port User Name Status
======== ========================= ==== =================== ======
1:49:21 GET / 443 [email protected] 200 ;they showed up logged in!
1:57:35 GET /ControlPanel.aspx 443 [email protected] 200 ;now they're crawling that user's stuff!
1:57:35 GET /Defautl.aspx 443 [email protected] 200 ;back to the homepage
2:07:21 GET /ViewTransaction.aspx 443 [email protected] 200 ;and here comes the private information
The user, [email protected]
had not been logged in for over a day. (I was hoping that IIS had giving the same session identifier to two simultaneous visitors, separated by an application recycle). And our site (web.config
) is not configured to enable session-less cookies. And the server (machine.config
) is not configured to enable session-less cookies.
So:
As recently as October 1 (4 days ago), the GoogleBot was still showing up, cookie in hand, logging in as this user, crawling, caching, and publishing, some of their private details.
How is Google a non-malicious web-crawler bypassing WebForms authentication?
IIS7, Windows Server 2008 R2, single server.
The server is not configured to give out cookieless sessions. But ignoring that fact, how can Google bypass authentication?
[email protected]
cookieless session url.None of these are really plausable.
How can Google a non-malicous web-crawler bypass WebForms authentication, and hijack a user's existing session?
I don't even know how an ASP.net web-site, that is not configured to give out cookieless-sessions, could give out cookieless session. Is it possible to back-convert a cookie-based session id into a cookieless-based session id? I could quote the relevant <sessionState>
section of web.config
and machine.config
, and show there is no presence of
<sessionState cookieless="true">
How does the web-server decide that the browser doesn't support cookies? I tried blocking cookies in Chrome, and I was never given a cookie-less session identifier. Can I simulate a browser that doesnt' support cookies, in order to verify that my server is not giving out cookieless sessions?
Does the server decide cookieless sessions by User-Agent string? If so, I could set Internet Explorer with a spoofed UA.
Does session identity in ASP.net depend solely on the cookie? Can anyone, from any IP, with the cookie-url, access that session? Does ASP.net not, by default, also take into account?
If ASP.net does tie IP address with the session, wouldn't that mean that the session couldn't have originated from the employee at their home computer? Because then when the GoogleBot crawler tried to use it from a Google IP, it would have failed?
Has there been any instances anywhere (besides the one I linked) of ASP.net giving out cookieless sessions when it's not configured to? Is there a Microsoft Connect issue on this?
Is Web-Forms authentication known to have issues, and should not be used to security?
Edit: Removed name of Google the bot that bypassed privilege, as people are pants on head retarded; confusing Google the name of the crawler for something else. I use Google the name of the crawler as a reminder that it was a non-malicious web-crawler that managed to crawl it's way into another user's WebForm's session. This is to contrast it with a malicious crawler, that was trying to break into another user's session. Nothing like a pedant to bring out the aggravation.
Though the question mainly references session identifiers, the length of the identifier struck me as unusual.
There are at least two types of cookie/cookieless operations that can modify the query string to include an ID.
They are completely independent of each other (as far as I can tell).
A cookieless session allows the server to access session state data based on a unique ID in the URL versus a unique ID in a cookie. This is usually considered a fine practice, though ASP.Net reuses session IDs which makes it more prone to session fixation attempts (separate topic but worth knowing about).
Does session identity in ASP.net depend solely on the cookie? Can anyone, from any IP, with the cookie-url, access that session? Does ASP.net not, by default, also take into account?
The session ID is all that is required.
General Session Security Reading
Based on the length of the example data, I'm guessing your URL actually contains a forms authentication value, not a session ID. The source code suggests that cookieless mode is not something you must explicitly enable.
/// <summary>ASP.NET determines whether to use cookies based on
/// <see cref="T:System.Web.HttpBrowserCapabilities" /> setting.
/// If the setting indicates that the browser or device supports cookies,
/// cookies are used; otherwise, an identifier is used in the query string.</summary>
UseDeviceProfile
Here's how the determination is made:
// System.Web.Security.CookielessHelperClass
internal static bool UseCookieless( HttpContext context, bool doRedirect, HttpCookieMode cookieMode )
{
switch( cookieMode )
{
case HttpCookieMode.UseUri:
return true;
case HttpCookieMode.UseCookies:
return false;
case HttpCookieMode.AutoDetect:
{
// omitted for length
return false;
}
case HttpCookieMode.UseDeviceProfile:
if( context == null )
{
context = HttpContext.Current;
}
return context != null && ( !context.Request.Browser.Cookies || !context.Request.Browser.SupportsRedirectWithCookie );
default:
return false;
}
}
Guess what the default is? HttpCookieMode.UseDeviceProfile
. ASP.Net maintains a list of devices and capabilities. This list is generally a very bad thing; for example, IE11 gives a false positive for being a downlevel browser on par with Netscape 4.
I think Gene's explanation is very likely; Google found the URL from some user action and crawled it.
It's completely conceivable that the Google bot is deemed to not support cookies. But this doesn't explain the origin of the URL, i.e. what user action resulted in Google seeing a URL with an ID already in it? A simple explanation could be a user with a browser that was deemed to not support cookies. Depending on the browser, everything else could look fine to the user.
The timing, i.e. the duration of validity seems long, though I'm not that familiar with how long the authentication tickets are valid and under what circumstances they could be renewed. It's entirely possible ASP.Net continued to reissue/renew tickets as it would do for a continually active user.
I'm making a lot of assumptions here, but If I'm correct:
Explicitly disable cookieless behavior by using HttpCookieMode.UseCookies
.
web.config:
<authentication mode="Forms">
<forms loginUrl="~/Account/Login.aspx" name=".ASPXFORMSAUTH" timeout="26297438"
cookieless="UseCookies" />
</authentication>
While this should resolve the behavior, you might investigate extending the forms authentication HTTP module and adding additional validation (or at least logging/diagnostics).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With