Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic authentication encoding when credentials contain Unicode

I have been banging my head against the wall on this problem. I've read similar posts & articles; most suggest to set URIEncoding to UTF-8 in Tomcat's server.xml file, but that doesn't seem to make a difference here.

I have a ReSTful web service deployed to a test environment where it is hosted on Tomcat 7. Tomcat is configured to use Java 6, although Java 7 is installed on the machine as well. When running basic authentication tests against the service hosted there, login fails and I receive a response of HTTP status 401 when the original credentials contain Unicode characters. Basic auth works fine when the credentials contain only ASCII. I can also login without using basic auth at all - my service supports custom login headers and RFC 2047. Using that approach, it doesn't matter whether the credentials contain Unicode or not, logging in is not a problem.

Specifically the "problem" appears to be that the username is being UTF-8 encoded twice. There is a bug in my logger (separate issue) where the log files are ANSI-encoded. When you convert the log files to UTF-8, the characters will appear correctly. But in this case, the problematic username is much longer than it should be, and when the file is converted to UTF-8 it then appears like it should have in the first place (before being converted). For example:

  • BAD (BASIC AUTH): SampleUser-¢𣎴eÌ‚é¾± -> SampleUser-¢𣎴eÌ‚é¾±
  • GOOD (RFC 2047): SampleUser-¢𣎴eÌ‚é¾± -> SampleUser-¢𣎴ê龱

The real kicker here is that I have my own instance of Tomcat 7 (Java 6) running locally, and I cannot reproduce the problem against it. I've compared the conf directories of the two Tomcats and they appear to be the same. I can't figure out why basic auth is working in one environment and not the other. I'm running the tests from my machine so it can't be due to a discrepancy in the way I'm testing it (JUnit/JSystem).

Here's what I know:

  • It doesn't matter what kind of user we're talking about with respect to privileges. Unicode in the username is the problematic factor.
  • It doesn't matter whether the request is sent via XML or JSON. My service supports both types of serialization.
  • The accept charset and content-type (if applicable) are both set to UTF-8 on the request.
  • The Java system properties are the same in both environments.

The following articles are very interesting to me, because they suggest the possibility of combining RFC 2047 and basic auth together. I didn't think that would be necessary because the basic auth string itself contains only ASCII (as it is base-64 encoded). Even if so, why would such a thing be required on one Tomcat server and not another? I feel like pursuing this combination approach isn't addressing the root problem, which is what's really driving me mad!

  • http://www.mentby.com/Group/tomcat-user/basic-authentication-failed-with-multibyte-username.html
  • What encoding should I use for HTTP Basic Authentication?

Thanks in advance for suggestions on things to try or double-check. The test environment is somewhat limited to me - I can only "play with it" during off-hours, so I apologize in advance if I don't respond promptly.

like image 404
lkee00 Avatar asked Oct 06 '22 10:10

lkee00


1 Answers

From the data you provided, it actually seems like the UTF-8 data is getting converted to an ASCII encoding instead of being doubly UTF-8 encoded.

As far as the actual issue, unfortunately basic authentication doesn't provide any sort of way to transmit the charset of the undecoded username and password. Because of this, your main options are to assume and manually specify a charset, use the default charset from your environment, or determine a custom way to provide the charset (e.g. another header). Each of the options kind of depends on how much control you have over the environment and on the client/server ends of the communication and if you want all calls to use the same charset.

Based on one server behaving correctly and the other not, I'm assuming that decoding is currently using the default charset from the environment. You're correct that the encoded string will only contain ASCII (so you're probably not seeing an issue transmitting the encoded value), so the data is probably being lost during (or after) the decoding process. Depending on which library you're choosing, it probably either produces a byte array or a String, so be sure to check that you are providing the charset when creating a String from the byte array (e.g. new String(decodedData, someCharset)) or see if there's a way to provide it to the library (if it produces a String).

like image 158
Adam Avatar answered Oct 10 '22 02:10

Adam