Testing Advice Part Two Or What Comes Next

epkatz

I’ve read plenty of blog posts and books on testing in my career that extol the benefit of testing and give great instruction on frameworks and best practices. I don’t often see any thoughts on what happens next. I’m going to attempt to skip past the first part and assume if you’re reading this you a) believe and understand that testing is important and b) have a good sense of how to set up your favorite testing framework.

Culture

The most important part of testing is building a culture of testing. It’s not enough to convince engineers and business stakeholders that tests are important and valuable in the abstract. Everyone can talk about how tests lead to more productive engineers in a meeting. But when crunch time comes around (and that’s basically every week in startup land), they need to feel that testing will help them.

So how do you get there? In my experience, engineers need it to become habit to write tests and uncomfortable to not write tests. I’m primarily a Node developer at the moment and I’m learning Ruby. It’s incredibly uncomfortable for me to end a statement without a semicolon. That sounds trivial but that needs to be the level of discomfort for an engineer to not write a test. They need to look at their pull-request that’s being sent out for review and feel deeply that tests are missing or insufficient. That feeling comes from repetition and from having culture-carriers on the team who already have that feeling learned from elsewhere. In general, I’ve found that a 3-1 ratio (3 learning to 1 culture-carrier) is sufficient to spread the culture as long as the 1 engineer is senior, respected and weighting their time heavily towards code review and pairing.

Habit is important and repetition will get you there but you can speed up the process and make your life is easier if writing tests, debugging tests and monitoring tests is an easy task. Most companies under-invest planning and refactoring of their testing infrastructure so that it grows much more haphazardly than your app code. It’s my (and many others) belief that your testing code should be as clean and easy to maintain as your app code. That might mean prioritizing and pushing for projects around cleaning up your testing. It might mean that you need to assign ownership of different parts of your testing infra to individuals. One example is when I assigned an engineer to come up with a better way to make commonly used mocked object factories easier to discover. We were in a situation where engineers wouldn’t be able to find the factory and then just create a new one. That reduced each engineer’s motivation to write tests for more complicated use-cases and even if they did push through it was wasted dev time that duplicated work and required significant cognitive overhead.

It should go without saying that every engineer that contributes to your codebase whether it’s a complicated system or a small css tweak should ideally be writing some kind of test that verifies the change works and protects against future regressions.

To your business stakeholders, it’ll be very hard to explain why they are seeing value but there are strategies here as well. Start tracking your bug rate or other metrics that might be impacted by user-facing bugs (conversion? support tickets?) and put a dollar amount against it. As you improve your testing capacity, show them the trend and prove with numbers that there is real business value here. I would have charts that showed the user-reported bug rate with an average amount of time that each engineer spent on a bug that made it into the sprint. I would update this chart and mark it with milestones around our various testing improvement efforts. There was one significant downward trend that started right after we introduced end-to-end tests and the trend continued to decrease for months as engineers added coverage over new parts of the system. We were easily able to add at least one engineer’s worth of time without hiring a new engineer.

Critical Mass

Say you’ve convinced the team and your testing framework is in decent enough shape that writing tests are easy. You’ve reached several thousand tests and you think all is well. Except you start hearing about flaky tests or skipped tests that keep cropping up. Alice needs to add a new widget and a completely unrelated test fails about half the time. This widget is business critical so Alice (along with her manager or tech lead) decide that just this once you’re going to skip the test or rerun your CI workflow until it passes. Then it happens again with Bob’s important ticket. Then Conrad does the same thing with a really slow test that is taking forever to run. Suddenly your test suite is a leaky bucket where you’re losing code coverage quickly because tests are annoying engineers. So what do you do?

First and foremost you want to have visibility into the problem. Hearing about bad tests anecdotally is a good way to get your spidey-sense tingling but not good enough to actually fix the problem and make sure it stays fixed. Get actual metrics on the issue. Many of the popular CI tools these days even have built-in or easy to integrate tools that give you details on what tests are flaky and what tests are slow. Make this automated and create a process around fixing these as soon as they come up. Once you have reporting in place you first want to stop the bleeding. Sometimes that means dedicating time to fixing each broken (I include slow tests in this definition) test and sometimes the problem is so widespread that you have to declare testing bankruptcy and just delete huge parts of your testing suite. The decision is too situational for me to get advice here but know that I’ve done both in different circumstances and have not regretted either decision.

My concrete recommendation is to get a periodic report (weekly worked best for me) including all tests that are slower than the threshold you deem acceptable and which tests have failed in the last week or whatever heuristic you think is best for a test being flaky. Then assign someone (as often as possible the engineer responsible) to fix them as if they were bugs. This has the added benefit of making the engineering team realize that they need to be aware of what can cause these issues. You can start a tradition of adding learnings into your documentation so that engineers can learn from each other and also have a place to read up on possible gotchas in the system. Make sure these test failures are not being assigned to the same folks over and over. It’s tempting to just assign them to the engineer who will fix them the fastest or has bandwidth. You want these test breaks to feel the same as bugs.

Test Environments

As I said above, I don’t want to get opinionated on specific test frameworks and what you should be looking for but it might be worth a cursory overview of some features of good testing environments. The first thing I usually look for is whether you can run your entire test suite locally. Being able to have your tests running locally (an important part of TDD) is much more efficient than pushing to a CI and waiting for test feedback. If your test suite takes an unreasonably long time or if you can’t even run it (whether because of missing configuration or environment limitations), you’ll make it that much harder on engineers who are making significant changes to the system. It can also be a sign that your application itself is getting too large and might need to be split up.

A good testing environment also lets you use the least amount of code and resources to maximize your testing coverage. That means using end-to-end tests for user flows and not for testing business logic. It means using a screenshot test for visual regressions but not for state changes. It means being able to test database changes in an integration test and not having to setup a staging environment (all silly examples but you get my point) You want to have different tools to test different levels of your system and make sure every engineer knows which tool is best suited to which situation. I’ve seen teams just get drunk on end-to-end tests (which are often much more expensive to run and change) when a unit testing suite would be just fine. I’ve also seen integration tests that are trying to simulate what a user would do when an end-to-end test would have made more sense. Identify what you need and make sure the team is aware of the strengths and weaknesses of each tool you have. Dev talks and mob-programming can be very helpful here. As often as you can buy off the shelf rather than build it in house. It’ll make it that much easier to get up and running and you’re likely to get the benefit of community and vendor documentation rather than writing your own.

Many people differ on whether it’s worth running CI on every commit for near-instant feedback or wait until after code-review or some other condition. I’m not strongly opinionated as long as you run the entire test suite on a development branch before being able to merge into master to avoid interfering with other engineers. Then run the entire test suite before deploying to production to make sure conflicts in code and logic haven’t made it in and also as a sanity check. Test suites failing should block engineers from merging. I know this seems like overkill or that I don’t trust individual engineers but it’s actually cover for them so that they can avoid pressure to just “merge and fix after”. It also creates an opt-out culture rather than an opt-in. Often called “The Default Effect”, it’s a powerful psychological phenomenon that’s used for influencing policies as varied as organ donation to voter registration and it works for engineering team testing policies.

Engineers need to be able to easily debug failing tests both locally and in CI. That means choosing a framework with robust and verbose logging but also storing history so that patterns can be observed between runs. Some folks fall into the trap of logging too much which can actually make it more difficult to debug issues. A good framework has sensible defaults but also the ability to modify the level and nature of logging for your unique situation. It’s also helpful to be able to share logging easily. I’ve always preferred hosted CI for most of these reasons.

One topic I don’t often see mentioned is making sure you have a way of avoiding CI interruptions due to third party service issues. For example, does your CI call out to a vendor API and if that API broke would you be able to push code? Folks are conflicted on mocking third-party services vs using sandboxes (obviously never use your production service). What I will advise is having a very simple and accessible way of turning off the test or dependency such that you can keep on working in the event of an outage outside of your control. Some dependencies won’t have a solution (like if your CI vendor or version control goes down) but as often as possible, ask whether you can continue to test and deploy even with this outage by adding simple configuration options to turn off the dependency.

Other Practices

I could write an entirely separate piece on my thoughts on monitoring, alerting and logging and their respective uses. For now I’ll settle on making sure you realize I’m not advocating for testing instead of monitoring, alerting and logging. I think it’s important but that they solve a different problem than testing and are too often confused. Don’t rely on alerting to catch regressions you could have caught before they were deployed. Don’t rely on logging to see if a certain condition happens or not when you could have written a unit test.

It’s important to note that many organizations believe in manual QA at some point in the development process. Manual QA is a great role and can be used very effectively when done right. It’s not a replacement for automated tests and when possible QA should avoid trying to catch small regressions or verify system correctness. QA is too valuable to waste on doing something that a test can do. Manual QA should be used to test issues that are extremely difficult or impossible to automate. It should most importantly be used to verify correctness of the product requirements and determine whether new features adhere to the product and design spec.

Failure

I’ve failed at most of this advice at some point or another. It’s mostly how I came to the opinions I’ve mentioned. A testing culture is something you just have to continually work at. There’s never a “there” that you can get to. If you come in with the attitude that there is an end-goal you’ll disappoint your team when it never really ends. You should just start making incremental improvements and one day you’ll realize how far you’ve gotten as a team. Make testing a priority, demonstrate its value concretely and reward positive advancements and you’ll be shocked at how much support you’ll get from your engineering team and your organization over time.