Random bits of information by a developer

28 April 2009

Dependency Primer

Every software project has dependencies: your own resources, your own APIs you've created, third party APIs and projects, etc. We've all had to deal with them at some point in our career. Recently while playing with Gradle (www.gradle.org) I've come to a realization about how they should be handled correctly. In this post, I'll talk about the two most prominent kinds of dependencies, their scopes, and where they fit into an application. I will not be discussing packaging or how the different scopes must be resolved to create a distributable / deployable artifact. Being a Java developer this is seen from within the Java space, but the concepts are universal to all software projects.

Dependency Types

There are basically two types or categories of dependencies: first level and transitive. First level dependencies are those resources that your application directly relies upon. Examples of first level dependencies include the language in which the project is written, entities your project directly uses such as an XML parser, images, and classes from a third party project. Transitive dependencies are dependencies of your first level dependencies. An example from the Java world could be commons-logging which is dependent on some logging implementation, commonly log4j or JDK logging. For commons logging the log implementation is a first level dependency, but for your project it's a transitive dependency. Another example may be an SAX XML parser (or any other XML parsing API). The SAX API is used directly by your project and is therefore a first level dependency, but it requires an implementation, possibly Xerces, which would be a transitive dependency of your project. This definition of dependencies has really been ingrained in me while using and trying Gradle (a build system [yes, another one] written in Groovy). In the past I've used Maven or Ivy (basically as a Maven alternative, but it does much more in the world of dependencies). Maven introduced me to the concept of transitive dependencies and how they fit into it's life cycle and therefore your project's life cycle. I owe a great deal to Maven for introducing me to the concepts, but Maven has a few blemishes (at least I think so) in the way it handles these different types of dependencies. Maven will by default (I don't think you can change this) include all of the transitive dependencies of the same scope in your classpath, which in my opinion, is incorrect. Here's an example taken from JSFUnit: pom.xml:
   <dependency>
      <groupId>net.sourceforge.cssparser</groupId>
      <artifactId>cssparser</artifactId>
      <scope>compile</scope>
   </dependency>
   <dependency>
      <groupId>net.sourceforge.nekohtml</groupId>
      <artifactId>nekohtml</artifactId>
      <scope>compile</scope>
   </dependency>
   <dependency>
      <groupId>xalan</groupId>
      <artifactId>xalan</artifactId>
      <scope>compile</scope>
   </dependency>
Dependency Tree:
[INFO] +- net.sourceforge.cssparser:cssparser:jar:0.9.5:compile
[INFO] |  \- org.w3c.css:sac:jar:1.3:compile
[INFO] +- net.sourceforge.nekohtml:nekohtml:jar:1.9.9:compile
[INFO] |  \- xerces:xercesImpl:jar:2.8.1:compile
[INFO] +- xalan:xalan:jar:2.7.0:compile
[INFO] |  \- xml-apis:xml-apis:jar:1.0.b2:compile
For this project cssparser, nekohtml and xalan have been configured as first level dependencies, but the effective classpath contains their compile (first level) dependencies as well. If the project relies on these libraries it should state them explicitly and not rely on the crutch of having them as transitive dependencies. Ivy usage can fall into the same problem, but this is not the case in Gradle (at least not without changing the default), which I believe is the correct way of handling first level dependencies. With Gradle the compile scoped (more in the next section) is not resolved transitively, so you'll have compile time errors if you have not declared a needed dependency.

Life cycle Scopes and Dependencies

The build life cycle for a software project can be distilled into three phases: compile (if needed), test, and runtime. Test is a little special because it contains two phases itself: testCompile and testRuntime, which are extensions of compile and runtime. So where do the different types of dependencies come into play? Your first level dependencies become your compile dependencies and runtime dependencies are pretty much your compile dependencies with transitive dependencies and a few other things that may be provided for you like container provided dependencies (though, those are arguably transitive dependencies of any third party dependency) such as a messaging provider, an HTTP implementation, transaction support, etc. The test dependencies extend compile and possibly runtime, and add their own dependencies for testing: a testing framework, mocking framework, possibly a slimed down server, and others, which of course would be first level dependencies for your tests.

Summary

To recap, there are two different kinds of dependencies: first level and transitive. First level are dependencies needed to build an run your project. Transitive dependencies are those dependencies of your dependencies. A software build life cycle essentially has three phases: compile, test, run. The compile phase should only use your first level dependencies. Runtime extends compile and is resolved transitively. Test extends both compile and runtime (though at different times) and uses its own dependencies as well. I hope this has been informative and helped others understand the relationship and distinction of dependencies and a software project.

No comments: