Myths of test automation – debunked!

By Jim Grey (about)

wrote a post last year criticizing test automation when it’s used to cover for piles of technical debt and poor development practices. But I still think there’s a place for automation in post-development testing. There are two keys to using it well: knowing what it’s good at, and counting the costs. Without those keys it’s easy to fall prey to several myths of test automation. I aim to debunk them here.

Myth: Automation is cheap and easy

It is seductive to think that just by recording your manual tests you can build a comprehensive regression-test suite. But it never seems to really work that way. Every time I’ve used record and playback, the resulting scripts wouldn’t perfectly execute the test, and I’ve had to write custom code to make it work.

St. Paul's Episcopal ChurchWhat I’ve found is that it takes 3 to 10 times longer to automate one test than to execute it manually. And then, especially for automation that exercises the UI, the tests can be brittle: you have to keep modifying scripts to keep them running as the system under test changes.

I’ve done straight record and playback. I’ve created automated modules that can be arranged into specific checks. I’ve led a team that created tests on a keyword-driven framework. And I currently lead a team that writes code that directly exercises a product’s API. The amount of maintenance has decreased with each successive approach.

A side note: given the cost of automating one test, can you see that you want to automate only what you are going to run over and over again, because otherwise the investment doesn’t pay?

Myth: Automation can test anything, and is as good as human testing

Automation is really good at repeating sets of actions, performing calculations, iterating over many data sets, addressing APIs, and doing database reads and writes. I love to automate these things, because humans executing them over and over is a waste of their potential.

This gets at a whole philosophical discussion about what testing is. I think that running predetermined scripts, whether automated or not, is just checking, as in, “Let me check whether clicking Save actually saves the record.” This subset of testing just evaluates the software based on predefined criteria that were determined in the past, presumably based on the state of the software and/or its specification or set of user stories as they were then.

The rest of testing involves human testers experimenting and learning, evaluating the software in its context now. This is critical work if for no other reason than the software and its context (environment, hardware, related software, customer needs, business needs, and so on) changes. An exploring human can find critical problems that no automated test can.

I want human testers to be free to test creatively and deeply. I love automated checks because they take this boring, repetitive work away from humans so they have more time to explore.

Myth: When the automation passes, you can ship!

It’s seductive to think that if testing is automated, that passing automation is some sort of Seal of Approval that takes out all the risk. It’s as if “tested” is a final destination, an assurance that all bets are covered, a promise that nothing will go wrong with the software.

But automation is only as good as its coverage. And if nobody outside your automation team understands what the automation covers, saying “the automation passed” has no fixed meaning.

It’s hard to overcome this myth, but to the extent I have, it’s because as an automation lead and manager I’ve required engineers to write detailed coverage statements into each test. I’ve then aggregated them into broad, brief coverage statements over all of the parts of the software under test. Then I’ve shared that information — sometimes in meetings with PowerPoint decks, always in a central repository that others can access and to which I can link in an email when I inevitably need to explain why passing automation isn’t enough. Keeping this myth at bay takes constant upkeep and frequent reminders.

Myth: Automation is always ready to go

Hope Rescue Mission“Hey, we want to upgrade to the next version of the database in the sandbox environment. Can you run the automation against that and see what happens?”

My answer: “Let’s assume I can even run the automation in sandbox. If it passes, what do you think you will know about the software?” The answer almost always involves feelings: “Well, I’ll feel like things are basically okay.” See “When the automation passes, you can ship!” above.

Automation is software, full of tradeoffs aimed at meeting a set of implicit and explicit goals. Unless one of those goals was “must be able to run against any environment,” it probably won’t run in sandbox. The automation might count on particular test data existing (or not existing). It might not clean up after itself, leaving lots of data behind, and that might not be welcome in the target environment. It might depend on a particular configuration of the product and its environment that isn’t present.

Even in the environment the automation usually runs in, it might not be ready to go at a moment’s notice. Another goal would need to be, “must be able to run at any time.” There are often setup tasks to perform before the automation can run: a reset of the database the automation uses, or the execution of scripts that seed data that the automation needs.

Myth: Just running the automation is enough

When I run automated tests, part of me secretly hopes they all pass. That’s because when there’s a failure, I have to comb through the automation logs to find what happened, figure out what the automation was doing when it failed, and log into the software myself and try to recreate the problem manually. Sometimes the automation finds just the tip of a bug iceberg and I spend hours exploring to fully understand the problem. Some portion of the time, the failure is a bug in the automation that must be fixed. When it’s a legitimate product bug, then I have to write the bug in the bug tracker.

I am endlessly amused by how often I’ve had to explain that just running the automation isn’t the end of it: that if there are any failures, the automation doesn’t automatically generate bug reports. The standard response is some variation of “What? …ohhhhhh,” as it dawns on them. So far, thankfully, it has always dawned on them.

Myth: Automated tests can make up for years of bad development practices

I’ve just got to restate my point from my older post on this subject. If your development team doesn’t follow good practices such as writing lots of automated unit tests (to achieve about 80% code coverage), code reviews, paired testing, or test-driven development, automation from QA is not going to fix it. You can’t test in quality — you have to build it in.

If you’re sitting on a messy legacy codebase, one where your test team plays whack-a-mole with bugs every time you make changes to it, you are far, far better served investing in the code itself. Refactor, and write piles of automated unit tests.

You want on the order of magnitude of thousands of automated unit tests, hundreds of automated business-rule tests (which hopefully directly exercise an API, rather than exercising a UI, for resiliency and maintainability), and tens of automated checks to make sure the UI is functioning.

I’ll belabor this point: Invest in better code and better development practices first. When you deliver better quality to QA, you’ll keep the cost of testing as low as possible and more easily and reliably deliver better quality to your customers and users.

Advertisements

9 thoughts on “Myths of test automation – debunked!

  1. Good read Jim. I’ve been in semiconductor verification for 20+ years now and I’ve always wondered how software is verified/tested versus how we do it for chips.

    Like

    1. The snarky answer to “how is software tested” is: badly, usually. I say it over and over: you have to engineer in quality, you can’t test it in. You can’t polish a turd.

      Like

  2. In processors, your state space is so huge (try calculating it some day) that we live and breathe by coverage based random testing done 24×7 for months. We also have the problem that “shipping” a defective design is fantastically expensive and very difficult to “patch” in the field.

    Like

    1. Yeah, your reality is very different from mine. You would be shocked by the relatively thin testing I do in my world before shipping. But we can do it because we can patch so quickly and inexpensively – especially relative to your world.

      Like

  3. 80% test coverage? Wow, you’re really aiming low … hold on, yeah we’re only at 78% at the moment. :/ And we were at 63% in November when I started at my current job. But yes, manual testing is always required. It’s for all those things that we the developer don’t think about.

    Like

    1. One thing about the 80% metric is that it puts the developers in a mindset of testing. It’s not just the coverage — it’s the culture: we developers take full responsibility for our own work. And then the testers are free to not just make sure the developers met the requirements or fulfilled the user stories — they can do the more interesting work of testing interactions among features, or making sure the system scales, or seeing what happens when production-like load is applied.

      Like

      1. Yes! And my last gig we even went as far as having another developer do the manual testing of a feature before handing it over to the dedicated tester so that he was free to do the more interesting thing.

        Like

  4. I’m a bit shocked by how much manual testing is done. I don’t know how you can test complex systems without invoking a lot of randomly generated stimulus and measuring your test coverage metrics.

    Like

    1. Enterprise software and consumer software have a different success profile, especially in the modern software as a service world. First, the paths users will take through the software are not random; they fall neatly under a bell curve. So you test the stuff directly under the tallest part of the curve, and touch on the high-risk potential failures across the rest of the curve. Then, in a well-designed SaaS environment, I can deliver a fixed bug to Production in minutes. That lets companies run with a mindset of deliver fast, but fix failures fast. Clearly you don’t want to deliver crap, Severity 1 bugs all over the place, as that erodes customer confidence in a hurry. But it doesn’t have to be perfect going out the door.

      That said, at every software company I’ve ever worked for — if we built bridges, I wouldn’t drive over one we built. 🙂

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s