Lyft used cloud-based isolated environments for a variety of purposes, including end-to-end testing. As the number of microservices increased, tests with these environments became more difficult to scale and lost value. Recent articles describe how Lyft moved to testing using request isolation in a shared staging environment and using adoption testing to port production deployments.
Lyft built a Docker-based container orchestration environment that engineers could use for testing. It consisted of some tooling that managed a local virtual machine and its configuration, including database seeding, downloading packages and images, and installation. Initially intended for local use, this environment has moved to the cloud and was named Onebox.
Engineers could use Onebox to run the service they wanted to test, along with its dependencies and related data stores. A miniature version of the Lyft systems, in a nutshell. Onebox eventually reached a point where it could no longer scale.
Lyft also has a shared staging environment, which is production-like in terms of (simulated) traffic and service levels. Because of its characteristics, teams were already implementing service features there to get feedback based on real-world data. It was a good candidate to replace Onebox when testing new features, but there was no service isolation: an unstable new feature could cause problems for the entire staging environment.
The solution was to implement “staging overrides” in the staging environment, which “fundamentally shifted” [Lyft’s] approach to the isolation model: instead of providing fully isolated environments, [they] isolated requests within a shared environment.”
Rather than isolating entire services, the new approach isolates requests.
Using this technique, technicians can deploy and launch service instances that do not participate in Lyft’s service mesh and therefore do not interrupt regular traffic. They are called “offloaded implementations”. When engineers want to test the new feature, they add specific headers to the request that ensure it is routed through the new instance.
Lyft built its service mesh with Envoy so that all traffic flows through Envoy sidecars. When a service is deployed, it is registered in the service mesh, becomes discoverable and starts processing requests from the other services in the mesh. An offloaded implementation contains metadata that prevents the control plane from making it discoverable.
Engineers create offloaded implementations directly from their pull requests by calling a specialized GitHub bot. Using Lyft’s proxy application, they can add protobuf-encoded metadata to requests as OpenTracing baggage. This metadata is spread across all services throughout the life of the request, regardless of the service implementation language, request protocol, or queues between them. The Envoy’s HTTP filter has been modified to support staging overrides and route the request to the offloaded instance based on the request’s overwrite metadata.
Engineers also used Onebox environments to run integration testing through CI. As the number of microservices increased, so did the number of tests and their runtime. Conversely, its efficacy declined for the same reasons that led to Onebox’s discontinuation.
Lyft engineering examined existing end-to-end testing and found that only a subset represented critical business flows.
These critical tests have been converted into acceptance tests. At the same time, Lyft moved away from a model where each service has its own set of integration tests to a small centralized collection of end-to-end acceptance tests. By centralizing acceptance testing, test duplication was eliminated, test relevance could be better achieved and maintained, and code reused between tests.
It was decided to discontinue end-to-end testing as part of the “inner loop” CI. Instead, acceptance testing is performed in the staging environment after each deployment.
The results of the acceptance test determine subsequent production deployments and effectively serve as a new port for production deployment.
Lyft extended an existing traffic simulation engine so that engineers could also use it to conduct acceptance testing. Tests are described using a custom configuration syntax because this current engine has been extended. Otherwise “existing test frameworks, such as cucumber/pickle might have served” [Lyft] better if [they] started from scratch”.
After switching from integration tests run on every commit to pre-production acceptance tests, thousands of integration tests were removed or converted to unit tests, pull requests are ready to merge in minutes and there is no measurable increase in bugs made it into production.
About acceptance testing Ben Linders interviewed: Dave Farley on InfoQ about the relevance and benefits of automated acceptance testing.