Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling changes in dependent 3rd party libraries

I have a project which depends on several 3rd party libraries, the project itself is packaged as a jar and distributed to other developers as a library. Those developers add the dependencies to their classpath and use my library in their code.

Recently I had an issue with one of the 3rd party dependencies, the apache commons codec libary, The problem is this:

byte[] arr = "hi".getBytes();
// Codec Version 1.4
Base64.encodeBase64String(arr) == "aGk=\r\n" // this is true

// Codec Version 1.6
Base64.encodeBase64String(arr) == "aGk=" // this is true

As you can see the output of the method has changed with the minor version bump.

My question is, I don't want to force the user of my library to a specific minor version of a 3rd party library. Assuming I know about the change to the dependent library, is there anyway in which I can recognize which library version is being included in the classpath and behave accordingly? or alternatively, what is considered to be the best practice for these kind of scenarios?

P.S - I know that for the above example I can just use new String(Base64.encodeBase64(data, false)) which is backwards compatible, this is a more general question.

like image 970
Asaf Avatar asked Feb 09 '12 20:02

Asaf


2 Answers

You ask what is the "best practice" for this problem. I'm going to assume that by "this problem" you mean the problem of 3rd party library upgrades, and specifically, these two questions:

  1. When should you upgrade?

  2. What should you do to protect yourself against bad upgrades (like the commons-codec bug mentioned in your example)?

To answer the first question, "when should you upgrade?," many strategies exist in industry. In the majority of the commercial Java world I believe the current dominant practice is "you should upgrade when you are ready to." In other words, as the developer, you first need to realize that a new version of a library is available (for each of your libraries!), you then need to integrate it into your project, and you are the one who makes the final go/no-go decision based on your own test bed --- junit, regression, manual testing, etc... whatever it is you do to ensure quality. Maven facilitates this approach (I call it version "pinning") by making multiple versions of most popular libraries available for automatic download into your build system, and by tacitly fostering this "pinning" tradition.

But other practices do exist, for example, within the Debian Linux distribution it is theoretically possible to delegate a lot of this work to the Debian package maintainers. You would simply dial in your comfort level according to the 4 levels Debian makes available, choosing newness over risk, or vice versa. The 4 levels Debian makes available are: OLDSTABLE, STABLE, TESTING, UNSTABLE. Unstable is remarkably stable, despite its name, and OLDSTABLE offers libraries that may as much as 3 years out of date compared to the latest-and-greatest versions available on their original "upstream" project websites.

As for the 2nd question, how to protect yourself, I think the current 'best practice' in industry is twofold: choose your libraries based on reputation (Apache's is generally pretty good), and wait a little while before upgrading, e.g., don't always rush to be on the latest-and-greatest. Maybe choose a public release of the library that has already been available 3 to 6 months, in the hope that any critical bugs have been flushed out and patched since the initial release.

You could go farther, by writing JUnit tests that specifically protect the behaviours you rely on in your dependencies. That way, when you bring down the newer version of a library, your JUnit would fail right away, warning you of the problem. But I don't see a lot of people doing that, in my experience. And it's often difficult to be aware of the precise behaviour you are relying on.

And, by the way, I'm Julius, the guy responsible for this bug! Please accept my apologies for this problem. Here's why I think it happened. I will speak only for myself. To find out what others on the apache commons-codec team think, you'll have to ask them yourself (e.g., ggregory, sebb).

  1. When I was working on Base64 in versions 1.4 and 1.5, I was very much focused on the main problem of Base64, that is, encoding binary data into the lower-127 ASCIi, and the decoding it back to binary.

  2. So in my mind (and here's where I went wrong) the difference between "aGk=\r\n" and "aGk=" is immaterial. They both decode to the same binary result!

  3. But thinking about it in a broader sense after reading your stackoverflow posting here, I realize there is probably a very popular usecase that I never considered. That is, password checking against a table of encrypted passwords in a database. In that usecase you probably do the following:

    // a.  store user's password in the database
    //     using encryption and salt, and finally,
    //     commons-codec-1.4.jar (with "\r\n").
    //

    // b.  every time the user logs in, encrypt their
    //     password using appropriate encryption alg., plus salt,
    //     finally base64 encode using latest version of commons-codec.jar,
    //     and then check against encrypted password in the database
    //     to see if it matches.

So of course this usecase fails if commons-codec.jar changes its encoding behaviour, even in immaterial ways according to the base64 spec. I'm very sorry!

I think even with all of the "best-practices" I spelled out at the beginning of this post, there's still a high probability of getting screwed on this one. Debian Testing already contains commons-codec-1.5, the version with the bug, and to fix this bug essentially means screwing people who used version 1.5 instead of version 1.4 where you did. But I will try to put some documentation on the apache website to warn people. Thanks for mentioning it here on stack-overflow (am I right about the usecase?).

ps. I thought Paul Grime's solution was pretty neat, but I suspect it relies on projects pushing version info in the the Jar's META-INF/MANIFEST.MF file. I think all Apache Java libraries do this, but other projects might not. The approach is a nice way to pin yourself to versions at build-time though: instead of realizing that you depend on the "\r\n", and writing the JUnit that protects against that, you can instead write a much easier JUnit: assertTrue(desiredLibVersion.equals(actualLibVersion)).

(This assumes run-time libs don't change compared to build-time libs!)

like image 157
Julius Musseau Avatar answered Oct 05 '22 10:10

Julius Musseau


package stackoverflow;

import org.apache.commons.codec.binary.Base64;

public class CodecTest {
    public static void main(String[] args) {
        byte[] arr = "hi".getBytes();
        String s = Base64.encodeBase64String(arr);
        System.out.println("'" + s + "'");
        Package package_ = Package.getPackage("org.apache.commons.codec.binary");
        System.out.println(package_);
        System.out.println("specificationVersion: " + package_.getSpecificationVersion());
        System.out.println("implementationVersion: " + package_.getImplementationVersion());
    }
}

Produces (for v1.6):

'aGk='
package org.apache.commons.codec.binary, Commons Codec, version 1.6
specificationVersion: 1.6
implementationVersion: 1.6

Produces (for v1.4):

'aGk=
'
package org.apache.commons.codec.binary, Commons Codec, version 1.4
specificationVersion: 1.4
implementationVersion: 1.4

So you could use the package object to test.

But I would say that it's a bit naughty for the API to have changed the way it did.

EDIT Here is the reason for the change - https://issues.apache.org/jira/browse/CODEC-99.

like image 34
Paul Grime Avatar answered Oct 05 '22 12:10

Paul Grime