Technical

Beta Architecture: Scaling Developer Environments With Kubernetes

By December 8, 2016 10 Comments

Here at Branch, we move fast. It’s so critical to the success of our company that it is one of our core values. We are the fastest moving company I’ve worked for.

One of the keys to moving fast is allowing our development teams to iterate and test their code quickly and consistently. This means our development, deployment, and release infrastructure is key to being able to move swiftly. That’s why we invest in making sure developers are productive. I have a personal passion for empowering developers, specifically because nothing is more demoralizing than developers needlessly held back by tools or process.

Problem: Local Development Environments Are Difficult to Manage

Our core apps are written in Node.js, which call many services written primarily in Java. Our developers use Macs, but local development environments are inconsistent and our server configuration management system was only designed for Linux and wouldn’t work on the Mac. Developers self-manage the components on their Macs, including Node.js, correct versions of packages, build tools, and personal IDEs. Some developers also develop Java services, which require a different set of tools including Java, maven, another IDE and more package management tools. Setting a Mac up once isn’t hard, but local environments become a frequent and significant source of problems as tools, components, and services go out of date; causing developers to waste time troubleshooting version incompatibilities unique to each desktop.  And as we develop new core services, the complexity of the dependencies can increase exponentially.

Failed Solution: Docker on the Desktop

We attempted last year to use Docker on the desktop like many organization do to create consistent and replicated environments on Macs, but multiple attempts exposed the problems of this approach.

  • Docker on the desktop is difficult to manage. Exposing ports, multiple containers, linking containers are all difficult to do. Even with good tooling, diagnosing local problems is difficult and can be entangled with application issues.
  • Local docker environments require developers to understand too much about Docker, Vagrant, and Docker networking.
  • Running many containers is too memory and CPU intensive for many desktops.
  • Managing local databases requires frequent schema maintenance and differences between each developer’s database create difficult to reproduce bugs.
  • Developer are forced to either run every microservice, or understand a complex web of service dependencies in order to run only the components they need.
  • Sharing work on developer laptops is inconvenient because developers run local HTTP servers without stable IP addresses or DNS entries
  • Significant differences between desktop environments increase debugging difficulty.

These issues caused us to abandon local Docker environments and continue using local native development environments. But the challenges of debugging complex multi-system issues continued, especially as we continued to add more services into our evolving microservice architecture.

Successful Solution: Kubernetes-Powered Beta

We devised a system we call Beta to provide consistency and repeatability to all our development environments by leveraging the deployment, management, and scaling features of cloud services.

Beta gives each developer their own cloud environment in Kubernetes. It contains an individual set of core Node.js apps, but all users share backend services and databases. Using Kubernetes provides the consistency and portability of Docker containers then adds a layer of provisioning, scheduling, network management, and service discovery to provide full container life cycle management. We leave stateful services like data storage in a single shared environment outside Kubernetes so that developers have the same data to work with.

Consider the different ways different users use environments:

  • Product Management: wants to see, share and test upcoming changes with real data
  • Developers: value fast iteration and have personal preferences for tools
  • QA: requires build consistency and repeatability
  • DevOps: requires reliable deployment and rollback capabilities

Kubernetes: Why it Works

We started adopting Kubernetes earlier this year and were pleased with its power and flexibility in managing production applications. Kubernetes is a distributed management system for running Docker containers. Docker provides lightweight process isolation and a solid set of tools for building and running containers. However, Docker containers are a challenge to manage, particularly in a large dynamic system. That’s why we selected Kubernetes as a higher layer tool to control deployment and management of containers.

Inspired by the success of using Kubernetes to manage Docker containers for production, we revisited the idea using Docker for development environments, but this time using Kubernetes in AWS instead of on Docker on local machines. By running our Beta environment in Kubernetes,

we get Docker’s strengths of lightweight processes and consistency, but also the added management features to make it fast and reduce management responsibilities from developers. Then we use custom scripts to put it all together to make it a complete test environment.

How Developers Use Beta

With one command, all the following is done for our developers:

  • Provision, schedule, and deploy new Beta environment
  • Configure personalized node containers
  • Configure containers to use shared backing services
  • Startup Node.js http servers
  • Expose network correctly
  • Configuration of inter-service routing
  • DNS setup
  • SSL termination

BetaWith Beta, updating to the most recent tested environment is a one command, two minute operation. With startup time so quick, it let us change how we think about testing.

A Beta environment is created from the most recent build of our master branch, which passed its unit tests. It is already compiled and built into a working Docker image. Docker image layers are usually cached, so startup time is quick. Configuration is managed centrally, which reduces impact to developers by incorporating service changes automatically, but Beta environments are also individually configurable for situations that require it.

It is possible to use Beta as a development environment, but because developers regularly destroy their Beta environment they generally edit files on their local machines and then move them to Beta for testing. To support the most common workflow of local edit then push to Beta for testing, we created a tool to automatically rsync files from local environments into Beta. Developers can still use local development environments for editing and testing, but Beta is their fully featured personal testing environment; along with a usable development environment. Developers and QA also frequently pull different branches directly to their Beta for testing.

A typical developer workflow is:

  1. A developer creates their Beta which has the latest successful master build
  2. A developer edits files locally on their Mac
  3. A custom rsync tool automatically mirrors their local changes into their Beta environment
  4. Test in Beta. Most Node.js modules don’t need recompilation because Beta has the application pre-compiled. Some changes would require module compilation or gulp build, but only incremental recompilation
  5. Once a developer has tested their changes, they push a git branch from their local machine to GitHub and create a pull request.
  6. Once a pull request is approved, it can be merged into master which triggers a CircleCI build
  7. When CircleCI builds and tests master successfully a new Beta image is created

Several key features make Beta a self-service, instant, and up-to-date test environment:

  • Continuous integration runs unit tests on every master commit
  • Continuous integration builds complete and updates Beta image only when unit tests pass
  • Containers run light enough to give developers their own cloud managed environment
  • Automation makes creating Beta environments fast and easy
  • Kubernetes automatically schedules and provisions docker containers
  • Configuration management gives each Beta a standard configuration, but with minimal customizability
  • Docker layer image caching on Kubernetes nodes lets containers start quickly
  • Kubernetes service discovery enables using DNS everywhere
  • Wildcard DNS covers all Betas and enables full SSL support (important because bypassing SSL validation on mobile browsers is difficult)
  • Custom nginx Kubernetes Service dynamically routes each HTTP request to individual users Beta containers, known as the “uber” service because it handles all Beta traffic and routes it to the appropriate user container

How Betas Are Built

Positive Results

The Beta architecture has been a significant boost to our developer productivity by improving the speed and consistency of testing. The transition has required us to re-work our build and test processes as we moved away from a static QA environment. However, the new paradigm has almost entirely removed discrepancies between dev and QA environments.

Because Beta environments are identical, we have greatly reduced “it works for me” conversations that spawn back and forth debugging sessions. These problems frequently plague local development environments because each desktop is unique. Betas are designed primarily for consistency, not flexibility. Consistency helps reduce debugging to the changes a developer has made to their own Beta. Sometimes bugs affect all Betas, but these issues are naturally easy to reproduce and debug.

Beta also makes it easier to for non-developers to dabble in the source code to test light changes, without the effort required to setup and maintain a complete local development environment.

Fast provisioning and startup time means that it’s easier to start a clean Beta environment than to troubleshoot a complex broken environment. So, developers regularly destroy and recreate their Beta environment when their environment might be inconsistent or they think a

component changed underneath them.

Other extra benefits of moving off local environments to Beta:

  • Compile, build, and runtime are identical between Beta and production
  • Operational equivalence of developer, QA, and production environments
  • Eliminates the need to debug Mac OS, Virtualbox, and Vagrant. All environments are Ubuntu on Kubernetes
  • Beta in Kubernetes helps developers understand Kubernetes and our production environment
  • Ability to use DNS instead of local IPs allowing seamless access for Product / Design / QA / Eng
  • Automation of Beta management processes helps organization scale much better than automating management of desktop environments

Drawbacks of moving to Beta:

  • Beta requires developers to log into the VPN to do development work
  • Shared service and data storage outages still affect many internal developers
  • Developers don’t learn the service dependencies in the infrastructure
  • Stateless containers can be rescheduled by Kubernetes unexpectedly causing confusion when they revert code on them

In particular, having all Betas share data stores is a mixed blessing. Problems are more likely to

become visible with multiple developers using the same data, but it is possible to corrupt other users data. We will likely move toward individual data stores for each Beta, but it will take planning and work to do properly.

Future Work

In the near future, each developer will be able to run the full QA integration test suite against their own Beta environment, letting developers catch their own bugs before our busy QA team sees them.

We are still improving the developer tools, but a primary goal of Beta is to make it so easy that new developers can be productive as quickly as possible. We will likely spend time continuing to improve hybrid local and Beta environments. And not all services are in Beta, but we continue to move more services there.

In the near term, we hope to give each user personal versions of any stateless services they want and use shared services for things which they don’t want to manage themselves. This will give service developers the ability to modify and break services in their own environment before pushing changes to all other internal developers. On the other hand, keeping mostly shared services minimizes utilization and reduces version drift between environments. Provisioning each user a complete environment with all services is inefficient and can create a large number of version and configuration incompatibilities. Shared services provide all developers an identical view which helps maximize visibility of moving parts and helps surface common issues faster.

In the long term, we hope to give developers fully isolated data stores for those that want it. In particular, QA could make use of clean data stores for fully repeatable test runs. This is a bigger time and infrastructure investment but one we believe worth making. 

All together, Beta has greatly improved the speed and productivity of developers. The changes are still in progress, but Beta has proven to be a solid base which we can build on to let us deliver more code faster, with higher quality, and less frustration.

  • Robbie McKinstry

    Interesting. Is there any intention to open source Beta? I feel like this is the kind of thing I could personally use.
    Right now, I have to virtualize a Kubernetes instance and three over services in addition to MySQL on a Vagrant box as a development environment for my team. This is far more complicated than I would like, and I’d prefer to explore a cloud solution. It’s just getting too unwieldy locally.

    • Alex Bauer

      @robbiemckinstry:disqus, I don’t believe we have made any plans to open source Beta yet, but I use it every day myself and agree it’s a fantastic system. I will definitely pass this request along to our infra team.

      • Robbie McKinstry

        Thanks guys! Much appreciated! I love the idea and I love the post!

    • Hubert Chen

      There are parts of it that we could open source particularly some of the utilities. But it’s more of an architectural approach than being something that we could open up and everyone could deploy out of the box. There’s a fair amount of application specific stuff that would be entirely thrown out for most people. But I’ll try to see what parts of it to make open.

  • https://nanobox.io Steve Domino

    wow… very well done with the article, thank you!

    As I was reading this I was blown away by how you outlined exactly the issues we’ve experienced with Docker and Kubernetes, and basically came to the exact same conclusion we did.

    @aeromusek:disqus, we’ve even gone as far as you guys and created something that sounds like it’s very similar to Beta, to address the very same problems lol.

    • Alex Bauer

      Hey @skdomino:disqus,

      Thanks for the comment! I shared this with our infrastructure team, and they all got warm fuzzies. Glad to hear we aren’t the only ones with this issue, and we’re very pleased with how the Beta system has been working out so far! ?

  • BBHoss

    Why did you go with running the environments in the cloud instead of using minikube to run it locally? With minikube, you can even mount the hostPath into the containers so you don’t have to worry about rsyncing files.

    • Hubert Chen

      The biggest barrier is that the whole application is large including dozens of java apps and multiple databases. If we were to run minikube, there are too many dependencies to try and run the whole thing locally. It might be possible to stub out parts, but to do so would be a lot of work and the app would still be crippled and it requires a developer to know which parts are stubbed out and how each piece is interacting with their portion. It’s much simpler to just assume they have a full VPN connection into our development environment and can operate inside it. Using hostPath part would definitely be simpler than rsync, but everything else would be hard to replicate.

  • Hemanth Malla

    Thank you for the wonderful post 🙂
    Are you guys looking at moving stateful services also inside each beta anytime soon ?
    Shared databases can cause developers unnecessary frustration rite.
    If there were attempts already in that front, any experiences to share ?

    • Hubert Chen

      Moving databases into Kubernetes isn’t a high priority. There’s a couple of reasons: some application, some Kubernetes.

      The application reason is that developers aren’t modifying the database that much so they don’t need the full isolation and protection of putting each DB into their own Beta. But while testing it is useful for all developers to have all the same data and the same view of it. So if I put a new record into the DB, everyone can see it and we have a better chance of detecting hard to find bugs instead of just relying on my single Beta environment to test it.

      The Kubernetes reason is that my experience with stateful services and using EBS volumes in kubernetes isn’t that positive, I’ve seen frequent inability to re-attach EBS volumes back to a node when pods reschedule. It’s supposedly improved significantly in 1.5.3+ , but I haven’t tried it in newer versions yet. For non-production dbs in Beta, that’s probably OK, but I wouldn’t put a production DB into Kubernetes, and I prefer to have the non-production environment more similar to production.

      That said, we may have an optional configuration for DBs in the future.