An application I'm maintaining loads user agents extracted from web logs into a MySQL table column using the 'latin1' charset. Occasionally, it fails to load a user agent that looks like this:
Mozilla/5.0 (Iâ?; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML^C like Gecko) Version
I suspect it's choking on Iâ?
. I'm working to figure out if this should be supported, or if it's corruption introduced by the upstream logging system. Is this a legal user agent in a HTTP header?
When your browser is connected to a website, a User-Agent field is included in the HTTP header. The data of the header field varies from browser to browser. This information is used to serve different websites to different web browsers and different operating systems.
In fact, many will employ a user agent spoofing Chrome extension or plugin to help them adjust their UAS on the fly – a popular method of testing websites or browser compatibility. Some marketers may also use user agent spoofing to see how their ads, for example display campaigns, are showing on different browsers.
A Mail User Agent (MUA), also referred to as an email client, is a computer application that allows you to send and retrieve email. A MUA is what you interact with, as opposed to an email server, which transports email.
Chrome is using Apple WebKit engine to render HTML, but in order to avoid those websites show recommendation for Internet Explorer, added "Like Gecko" to it's useragent.
RFC 2616 (HTTP 1.1) says that message header contents must be "consisting of either *TEXT
or combinations of token, separators, and quoted-string". If you look at the definitions for TEXT etc you will find that legal characters are those with byte values not in the [0, 31] range and not equal to 127; therefore characters such as â
are as far as I can tell legal as per the spec.
Technically, octets > 127 are allowed in comments. RFC 2616 makes them default to ISO-8859-1, but HTTPbis (the upcoming revision of RFC 2616) has removed that rule so that sometimes in the distant future, we may be able to move to a sane encoding.
Recommendation: strip all octets > 127.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With