Beta Architecture: Scaling Developer Environments With Kubernetes

Here at Branch, we move fast. It’s so critical to the success of our company that it is one of our core values. We are the fastest moving company I’ve worked for.

One of the keys to moving fast is allowing our development teams to iterate and test their code quickly and consistently. This means our development, deployment, and release infrastructure is key to being able to move swiftly. That’s why we invest in making sure developers are productive. I have a personal passion for empowering developers, specifically because nothing is more demoralizing than developers needlessly held back by tools or process.

Problem: Local Development Environments Are Difficult to Manage

Our core apps are written in Node.js, which call many services written primarily in Java. Our developers use Macs, but local development environments are inconsistent and our server configuration management system was only designed for Linux and wouldn’t work on the Mac. Developers self-manage the components on their Macs, including Node.js, correct versions of packages, build tools, and personal IDEs. Some developers also develop Java services, which require a different set of tools including Java, maven, another IDE and more package management tools. Setting a Mac up once isn’t hard, but local environments become a frequent and significant source of problems as tools, components, and services go out of date; causing developers to waste time troubleshooting version incompatibilities unique to each desktop. And as we develop new core services, the complexity of the dependencies can increase exponentially.

Failed Solution: Docker on the Desktop

We attempted last year to use Docker on the desktop like many organization do to create consistent and replicated environments on Macs, but multiple attempts exposed the problems of this approach.

Docker on the desktop is difficult to manage. Exposing ports, multiple containers, linking containers are all difficult to do. Even with good tooling, diagnosing local problems is difficult and can be entangled with application issues.
Local docker environments require developers to understand too much about Docker, Vagrant, and Docker networking.
Running many containers is too memory and CPU intensive for many desktops.
Managing local databases requires frequent schema maintenance and differences between each developer’s database create difficult to reproduce bugs.
Developer are forced to either run every microservice, or understand a complex web of service dependencies in order to run only the components they need.
Sharing work on developer laptops is inconvenient because developers run local HTTP servers without stable IP addresses or DNS entries
Significant differences between desktop environments increase debugging difficulty.

These issues caused us to abandon local Docker environments and continue using local native development environments. But the challenges of debugging complex multi-system issues continued, especially as we continued to add more services into our evolving microservice architecture.

Successful Solution: Kubernetes-Powered Beta

We devised a system we call Beta to provide consistency and repeatability to all our development environments by leveraging the deployment, management, and scaling features of cloud services.

Beta gives each developer their own cloud environment in Kubernetes. It contains an individual set of core Node.js apps, but all users share backend services and databases. Using Kubernetes provides the consistency and portability of Docker containers then adds a layer of provisioning, scheduling, network management, and service discovery to provide full container life cycle management. We leave stateful services like data storage in a single shared environment outside Kubernetes so that developers have the same data to work with.

Consider the different ways different users use environments:

Product Management: wants to see, share and test upcoming changes with real data
Developers: value fast iteration and have personal preferences for tools
QA: requires build consistency and repeatability
DevOps: requires reliable deployment and rollback capabilities

Kubernetes: Why it Works

We started adopting Kubernetes earlier this year and were pleased with its power and flexibility in managing production applications. Kubernetes is a distributed management system for running Docker containers. Docker provides lightweight process isolation and a solid set of tools for building and running containers. However, Docker containers are a challenge to manage, particularly in a large dynamic system. That’s why we selected Kubernetes as a higher layer tool to control deployment and management of containers.

Inspired by the success of using Kubernetes to manage Docker containers for production, we revisited the idea using Docker for development environments, but this time using Kubernetes in AWS instead of on Docker on local machines. By running our Beta environment in Kubernetes,

we get Docker’s strengths of lightweight processes and consistency, but also the added management features to make it fast and reduce management responsibilities from developers. Then we use custom scripts to put it all together to make it a complete test environment.

How Developers Use Beta

With one command, all the following is done for our developers:

Provision, schedule, and deploy new Beta environment
Configure personalized node containers
Configure containers to use shared backing services
Startup Node.js http servers
Expose network correctly
Configuration of inter-service routing
DNS setup
SSL termination

With Beta, updating to the most recent tested environment is a one command, two minute operation. With startup time so quick, it let us change how we think about testing.

A Beta environment is created from the most recent build of our master branch, which passed its unit tests. It is already compiled and built into a working Docker image. Docker image layers are usually cached, so startup time is quick. Configuration is managed centrally, which reduces impact to developers by incorporating service changes automatically, but Beta environments are also individually configurable for situations that require it.

It is possible to use Beta as a development environment, but because developers regularly destroy their Beta environment they generally edit files on their local machines and then move them to Beta for testing. To support the most common workflow of local edit then push to Beta for testing, we created a tool to automatically rsync files from local environments into Beta. Developers can still use local development environments for editing and testing, but Beta is their fully featured personal testing environment; along with a usable development environment. Developers and QA also frequently pull different branches directly to their Beta for testing.

A typical developer workflow is:

A developer creates their Beta which has the latest successful master build
A developer edits files locally on their Mac
A custom rsync tool automatically mirrors their local changes into their Beta environment
Test in Beta. Most Node.js modules don’t need recompilation because Beta has the application pre-compiled. Some changes would require module compilation or gulp build, but only incremental recompilation
Once a developer has tested their changes, they push a git branch from their local machine to GitHub and create a pull request.
Once a pull request is approved, it can be merged into master which triggers a CircleCI build
When CircleCI builds and tests master successfully a new Beta image is created

Several key features make Beta a self-service, instant, and up-to-date test environment:

Continuous integration runs unit tests on every master commit
Continuous integration builds complete and updates Beta image only when unit tests pass
Containers run light enough to give developers their own cloud managed environment
Automation makes creating Beta environments fast and easy
Kubernetes automatically schedules and provisions docker containers
Configuration management gives each Beta a standard configuration, but with minimal customizability
Docker layer image caching on Kubernetes nodes lets containers start quickly
Kubernetes service discovery enables using DNS everywhere
Wildcard DNS covers all Betas and enables full SSL support (important because bypassing SSL validation on mobile browsers is difficult)
Custom nginx Kubernetes Service dynamically routes each HTTP request to individual users Beta containers, known as the “uber” service because it handles all Beta traffic and routes it to the appropriate user container

Positive Results

The Beta architecture has been a significant boost to our developer productivity by improving the speed and consistency of testing. The transition has required us to re-work our build and test processes as we moved away from a static QA environment. However, the new paradigm has almost entirely removed discrepancies between dev and QA environments.

Because Beta environments are identical, we have greatly reduced “it works for me” conversations that spawn back and forth debugging sessions. These problems frequently plague local development environments because each desktop is unique. Betas are designed primarily for consistency, not flexibility. Consistency helps reduce debugging to the changes a developer has made to their own Beta. Sometimes bugs affect all Betas, but these issues are naturally easy to reproduce and debug.

Beta also makes it easier to for non-developers to dabble in the source code to test light changes, without the effort required to setup and maintain a complete local development environment.

Fast provisioning and startup time means that it’s easier to start a clean Beta environment than to troubleshoot a complex broken environment. So, developers regularly destroy and recreate their Beta environment when their environment might be inconsistent or they think a

component changed underneath them.

Other extra benefits of moving off local environments to Beta:

Compile, build, and runtime are identical between Beta and production
Operational equivalence of developer, QA, and production environments
Eliminates the need to debug Mac OS, Virtualbox, and Vagrant. All environments are Ubuntu on Kubernetes
Beta in Kubernetes helps developers understand Kubernetes and our production environment
Ability to use DNS instead of local IPs allowing seamless access for Product / Design / QA / Eng
Automation of Beta management processes helps organization scale much better than automating management of desktop environments

Drawbacks of moving to Beta:

Beta requires developers to log into the VPN to do development work
Shared service and data storage outages still affect many internal developers
Developers don’t learn the service dependencies in the infrastructure
Stateless containers can be rescheduled by Kubernetes unexpectedly causing confusion when they revert code on them

In particular, having all Betas share data stores is a mixed blessing. Problems are more likely to

become visible with multiple developers using the same data, but it is possible to corrupt other users data. We will likely move toward individual data stores for each Beta, but it will take planning and work to do properly.

Future Work

In the near future, each developer will be able to run the full QA integration test suite against their own Beta environment, letting developers catch their own bugs before our busy QA team sees them.

We are still improving the developer tools, but a primary goal of Beta is to make it so easy that new developers can be productive as quickly as possible. We will likely spend time continuing to improve hybrid local and Beta environments. And not all services are in Beta, but we continue to move more services there.

In the near term, we hope to give each user personal versions of any stateless services they want and use shared services for things which they don’t want to manage themselves. This will give service developers the ability to modify and break services in their own environment before pushing changes to all other internal developers. On the other hand, keeping mostly shared services minimizes utilization and reduces version drift between environments. Provisioning each user a complete environment with all services is inefficient and can create a large number of version and configuration incompatibilities. Shared services provide all developers an identical view which helps maximize visibility of moving parts and helps surface common issues faster.

In the long term, we hope to give developers fully isolated data stores for those that want it. In particular, QA could make use of clean data stores for fully repeatable test runs. This is a bigger time and infrastructure investment but one we believe worth making.

All together, Beta has greatly improved the speed and productivity of developers. The changes are still in progress, but Beta has proven to be a solid base which we can build on to let us deliver more code faster, with higher quality, and less frustration.