Handling Node.js Dependencies At Box

When Box began embracing Node.js into our technology stack, we very quickly were faced with a decision about how to manage dependencies. Open source projects typically depend on npm for handling all aspects of dependency management: new dependencies are published, existing dependencies are downloaded, and you trust that the npm registry will be there to help you along the way. And while that works fine for these projects, it's a much more complex proposition inside of a company with deadlines.

Enterprise Node.js Constraints

We started using npm inside of Box to bring in external dependencies. Ironically, this happened around the same time that the public npm registry was down. We immediately came to realize that depending on a piece of infrastructure that we don't control and aren't paying someone to maintain doesn't give us the process freedom that we need. We would have a lot of explaining to do if a deployment ended up canceled because the public npm registry was down.

So we stopped and thought about our requirements:

  • We must be able to deploy regardless of the public npm registry's availability.
  • We must have a way to create our own modules and share them internally without publishing to the public npm registry.
  • We must have access to public modules that we need as dependencies in our projects.

Private Registry?

We ended up with the same question a lot of companies have when adopting Node.js: do we set up a private npm registry? We consulted with T.J. Fontaine of Joyent, Dav Glass of Yahoo, the mobile team at LinkedIn, and the Kraken team at PayPal, asking them each how they dealt with this problem. These conversations were invaluable because we had a good range of company and project sizes represented. We really wanted to make the right decision that would get us up and running quickly as well as give us some immediate-term stability.

Clearly, we needed to remove some of the dependency on the public npm registry as well as provide a way to share private modules inside of Box. A private registry would solve this problem, but there is the setup cost, the maintenance cost, and all of the operational questions that pop up when a new technology is added to a company. We would have to learn how the npm registry works, what the bottlenecks are, what its data loss characteristics are. That's a lot of overhead just to get started.

We eventually decided that, for the time being, we would use GitHub Enterprise as our source for private modules. This allows us the flexibility to use the npm client with our own modules while providing consistent storage and backup for our code.

But that only solved the storage problem. Our projects have both private and public dependencies, which still implies a dependency on the public npm registry. Our next step was to mitigate that concern.

Checking In Dependencies

Almost every project has some dependency on a public resource. Whether that resource is the public npm registry, or the jQuery documentation, or the ability to search Google for the right approach to something, you are still beholden to something that you don't control to get your job done. We accept this because of the productivity gains from using these resources and, in turn, we experience productivity loss when they become unavailable. So the question isn't really how to eliminate dependence on all public resources, but rather, how to mitigate the pain when one of those resources is unavailable.

The most important requirement in this regard is deploying without dependence on the public npm registry. Since we didn't setup a private registry, our only real option was to check in the `node_modules` directory. That would mean a source code checkout is completely self-contained, testable, and deployable without relying on the public npm registry. While some discourage checking in dependencies, this was the only real option available to us.

But would that pattern hold for all of our Node.js projects?

Node.js Project Types

After doing a survey of the Node.js projects at Box, it became apparent that we had only three project types, each with different lifecycles:

  • Libraries - these are the private npm modules we create and are intended to be used as a dependency on a larger project. Libraries are never deployed on their own, only as part of another project. They tend to have short lifecycles and long lifespans, which is to say they tend to be created and undergo rapid changes initially and then stabilize to the point where they are rarely updated except for bug fixes or occasional new features.
  • Command Line Utilities - things that we end up installing in our development environments. Both private and public command line utilities are bundled into RPMs before being installed. These also tend to go through a bunch of initial development and then slow down quickly. They are updated periodically for major changes but otherwise don't undergo a lot of change after initial deployment.
  • Web Applications - these are the products we offer customers, the things that we deploy into a production environment. Development for web applications is ongoing and frequent, and tends to involve a team rather than an individual.

In figuring out whether dependencies needed to be checked in, we looked closely at these three project types and their lifecycles.

What We Do Today

We wanted to err on the side of not checking in as much as possible while still meeting our requirements. Checking in dependencies means creating larger commits, and unfortunately, code not written by Boxers ends up front-and-center in our code review system. That's a cost we are willing to take if necessary but we want to minimize it whenever possible.

For libraries, since they are never deployed on their own and go through very small lifecycles, we determined it wasn't necessary to check in dependencies. When a Boxer is working on a command line utility or web application, installing a library through npm means that the library's dependencies will end up in the repository. If the public npm registry is down, that may interfere with our ability to upgrade a library, but this is the same as for a published module.

We also don't check in dependencies for command line utilities. When we package these utilities into RPMs, all of the dependencies are downloaded and are present in the final RPM. The RPM then becomes something that can be deployed offline. So if the public npm registry is down, we are prevented from creating an RPM, but the frequency with which new Node.js RPMs are created and updated is such that both the likelihood and impact are minimal.

Web applications are the only Node.js projects that have checked-in dependencies at Box. This goes back to the requirement of deploying even when the public npm registry is down. At any given time, what's checked-in to source control is ready to be deployed into production on its own. That means any libraries we wrote as well as any publicly-available modules. If the public npm registry is down, that will prevent us from easily upgrading our dependencies.

Emergency Upgrades

It's rare to have an emergency upgrade of a dependency, a little less rare to have an emergency addition of a dependency, but it can happen from time to time. An emergency situation typically involves a significant application problem (such as a security issue) while the public npm registry is down. This can affect upgrading or adding new dependencies, both public ones and private ones (which may depend on public resources). Even though the npm client is the easiest way to upgrade, we still have options for any web application:

  • For internal libraries, we can take the source directly from source control and insert it into web application repository. We update `package.json` manually in this scenario and ensure it's pointing to the correct Git tag for the new version.
  • For public modules, we can also find their source code online, since most modules are open source. This can be complex if the module has upgraded its own dependencies but still possible to do by hand (just go to each of those modules' repositories and do the same thing).

In both cases we have the option of manually changing the dependencies by hand. Sometimes this is the easiest way to fix a problem without adding a lot of complexity. That necessitates doing the real upgrade once it's possible, but at least we can "stop the bleeding" if necessary.

Conclusion

We're presenting this information not as a definitive guide to managing Node.js dependencies, but rather as a way to share what's been working for us and our decision-making process around it. Managing dependencies is an important part of any large application regardless of the technology stack. Our needs right now have led us to our current strategy, and as we continue Node.js development, we are likely to adjust how we do things along the way.