Self-Testing Applications

7 min readMar 21, 2021

Recently there was discussion on LinkedIn about how testing has changed in our brave new Cloud-native world.

It made me realise that I needed to write up some thoughts I’ve been developing about self-testing applications — an extension to good practice for testing modern applications (particularly in a SaaS environment). In a nutshell: I’d like to see tests granted a place as a primary part of running applications — not relegated to a distinct, ancillary codebase which gets jettisoned prior to release.

For the next article in this series, see: Building a Self-Testing Application

Straight to Production

When I started coding web applications back at the end of the 90s, I was living in cowboy country. Most of the testing we did involved poking around in a locally hosted web-site to see if we could break something.

This worked surprisingly well. Sure, we missed stuff, and it didn’t scale, but for small/medium applications where there were only one or two of us working on it, we got things done pretty fast. When we broke stuff, it was generally obvious. Sometimes we’d get into a situation where we couldn’t fix something without breaking something else, and life sucked. But generally, it was ok, especially if we used a good program design.

“QA”

It wasn’t until I worked for a fairly large company that I came across a formal process of “QA” for the first time. This involved a QA “environment” to which we deployed code before it went to production.

Test-Driven Development

When TDD came along, I was immediately drawn to the new acronym. In fact, it took me about 10 years before I started to fall out of love with it again — distracted by the refreshing approachability of BDD.

To be honest, I tend to use both TDD and BDD where appropriate. If I’m writing a library, or a complex algorithm, I always use TDD to facilitate a quick red-green-refactor cycle. If I’m writing a web app, I always use BDD — generally with some sort of Cucumber implementation.

Test automation

The other thing I gradually became more and more obsessed with over time was test automation. In the early days of SaaS, this usually meant Selenium or nothing (no disrespect WinRunner).

To be honest, the overall approach to testing hasn’t changed that much since the start of my career. We’ve added:

better tools (like Cypress)
collaboration frameworks (like Cucumber)
better integration with deployment pipelines
better cross-browser and device testing
a liberal sprinkling of DevOps

But essentially the pattern has remained much the same. Or at least, so it appeared for a long time.

Test in Production

About five years ago (~2016), I started to use cloud native services (like slots in Azure and CNAME swap in AWS ElasticBeanstalk) to achieve almost Test-In-Production functionality (via Blue/Green deployments).

This sort of testing has become more and more popular, and the cool kids now swear by Test-In-Production.

“If you aren’t testing in prod you aren’t testing in reality” — Charity Majors, 2019

There are different approaches to this, but most of them I’ve seen essentially do it by:

Pointing the automation tests at Prod (and decommissioning or de-emphasising the role of a UAT/Staging environment)
Some clever engineering to ensure that you’re not banjaxing a customer’s live data with your tests.
Testing the code in production before the customer gets to use it (e.g. a feature flag, or test tenant).

Ok, so why are things changing?

For SaaS in particular, the argument for Test-In-Production makes sense to me for three main reasons:

Testing the user’s experience — it gets as close as possible to testing what a user will actually experience. I want my tests to confirm that the deployed, running software works before a user touches it.
Reduction in complexity —if I can shorten the path to production, that’s a win for everyone. Maybe I still run tests prior to deployment — but that can be done in an ephemeral environment which spins up purely to service the deployment pipeline — I don’t need to maintain the fantasy of a permanent, parallel environment mirroring production.
Technology being put into production is less and less about pure computation. That is — the general assumption that, given the same input, the system under test will produce the same output doesn’t really scale well for today’s applications. Things like SaaS products and web applications are far too complex to test in that traditional way. Most applications are less like testing a calculator and more like testing a checkout counter in a busy shop. Testing for formal correctness, while still necessary, is often less important than testing a user journey or experience.

Businesses want quick delivery of functionality into production while ensuring safety and a smooth experience for end users. Engineers are finding ways to do that better, and removing complexity goes a long way to helping us achieve it.

With the rise of SaaS products (whether “cloud native”, containerised, or some variant), there are factors which encourage this trend:

Deployment — deployments (and rollbacks) are now easy to fully automate, even in secure environments. Deploying an application to production is now a push-button exercise — gone are the days when I had to have my hand scanned by a machine and wear booties on over my shoes in order to deploy a release onto the production server.
Application configuration and feature switching — because of the centralising nature of the SaaS business model, there’s almost always an awareness of tenant or at least user across much of the environment. It’s possible to route network traffic, and enable/disable features in ways which make this possible.

TL;DR — Self Testing Applications

To me, this (re-)simplification of the deployment experience takes us almost full circle to the simple “cowboy country” model I described from the start of my career — develop and test locally, then deploy to production.

Obviously the components are massively refined in comparison to what they used to be, but the only “extra” component is the suite of automation tests. For a few years now, I’ve been advocating for a design and deployment architecture which pushes that boundary a bit further — something I call self-testing applications.

I often use today’s car design as a useful mental model. When I turn my car on, it checks a whole bunch of things before it will let me set off, and at any time I can refer to my dashboard to see status about the running of the car. These things tend to fall into one of several categories:

Informational — my headlights are on, the temperature outside, fuel level
Warnings / notifications — a door isn’t closed properly, conditions are slippery, ABS was activated, fuel is low
Errors / critical — there’s a problem with the engine, the handbrake is on

I’d like my web applications to be a bit more “self-aware” in the way that my car is. Why can’t they test themselves when I access them? Or, at least, give me the option to look at a dashboard to see if everything is operational.

And why do I spend so much effort carefully crafting and collaborating about my BDD scenarios, but then ignore what they tell me exactly when I most care — when the system is being used?

What I really want is something which allows the application to run its own tests, and for testers, or end users, to access a dashboard where they can see whether the application is healthy or not.

I just want the tests to be granted their rightful place as a primary part of the running application — not relegated to a distinct, ancillary codebase.

Do you hate this idea?

I tend to get quite a lot of push back on this concept. People are very wedded to the idea that tests don’t belong in production. They argue that the code required to run tests will bloat the application and compromise the quality of the application itself — either by degrading performance, user experience, or security.

I think there’s a lot to be said for these concerns, but ultimately I come back to the idea of the car. Cars are built to perform, be pleasant to use and to protect themselves from outside attack. There are ways to engineer things to be self-diagnosing while retaining these critical qualities. In fact, I’d argue that in modern cars, the diagnostic/information elements are some of the biggest differentiators in the marketplace.

The new electric cars are certainly as much electronic as they are mechanical. While I sometimes miss the old days, I also like that, even a non-electric, “non-smart” car stops me killing myself in various ways that it didn’t 50 years ago. Applications are going to start doing this too.

In my next article, I’ll walk through a simplified example using a sample application which illustrates a few of these concepts, sticking to raw JavaScript to level the playing field.

I’m interested in feedback on this type of testing. It seems like the right way forward to me but I’m interested to discuss these ideas with others and find where the technology leads us next.