Visual regression testing with Wraith

Posted in: Design, Development

Last week I talked about how we chose Wraith as our visual regression testing tool.

After lots of discovery and preparation, we were now ready to get our first real tests up and running.

Screenshot of tests in Wraith
Example of a failed test in Wraith. Differences are highlighted in blue – in this case, I've modified the summary slightly.

In this post, I'll go through how we run our tests locally and as part of our continuous integration process. But go make a cup of tea first, because this will be a long one...

Running tests locally

We wanted to be able to compare what pages looked like before and after code changes. Wraith has two options for this:

  • wraith capture compares screenshots of the same pages on two different domains
  • wraith history and wraith latest take screenshots of the same pages at different points in time, so you can compare screenshots of a page before and after you've changed your code

wraith capture can be quicker to get started with, but also takes longer to run because you have to take two sets of screenshots each time.

Alternatively, once you've run wraith history to take your baseline screenshots of pages before the changes, you can run wraith latest as many times as you want to only take the new screenshots. So if you need to run the tests often, this can be a more efficient way of working.

Having two different options might sound like a pain to set up, but writing and running our first Wraith tests was a piece of cake. I had working tests on my virtual machine within a day.

Screenshot of a YAML file with Wraith configuration
An example of a Wraith config file for comparing pages across two domains

Once the tests were in place, running them was as easy as typing a single command in the command line.

I wrote up some docs for the rest of the team so they could easily get the tests running on their own virtual machines.

The next step was to automate as much of this process as possible.

Running tests as part of continuous integration

We use Bamboo to run our builds and deployments as part of our continuous integration and deployment workflow.

It made sense to get Bamboo to run our visual regression tests for us. We decided to use wraith history to take baseline screenshots of pages on staging as they should be, and then run wraith latest every time we modified the design to see the effect of our changes.

We ended up with two separate builds in Bamboo – one to set the baseline screenshots, and one to take new screenshots and run the tests. Both builds run through similar, but not identical, steps.

Steps for the baseline screenshot build

  1. Check out the test code from our GitHub Enterprise repository.
  2. Republish all the test pages.
  3. Install dependencies (like Wraith, PhantomJS and ImageMagick).
  4. Take the baseline screenshots.
  5. Save the baseline screenshots as an artefact so they can be accessed by other builds.

Steps for the test-running build

  1. Check out the test code.
  2. Republish all the test pages.
  3. Copy the latest baseline screenshots from the other build.
  4. Install dependencies.
  5. Take new screenshots and compare the baseline screenshots to the new screenshots for changes.
  6. Save all the screenshots and a webpage reporting on the status of each test.
  7. Trigger a Slack webhook to notify us whether the tests had passed or failed, including a link to the results.

Every time we deploy design changes to our staging environment, we run the test build and it checks all our test pages for differences.

Once the changes have been accepted, we update our baseline screenshots to use the latest approved version of the design.

This is cool. But we faced a lot of challenges and decisions along the way. I'll go into some of them now.

Deciding when to run the tests

You've probably noticed that this process isn't as automated as it could be. Right now, both of those builds only run when someone in the team tells them to.

Our application's integration and feature tests automatically run every time we push a change to staging. If those tests fail, the changes don't finish deploying. Why not do the same here?

One issue is that a difference in the design isn't automatically a failure – some of those changes will be intended. So if the visual regression tests fail, we don't necessarily want to derail the deployment entirely – it just means a human needs to look at the results and see if those differences are okay or not.

Another issue is that making a visual change often means modifying multiple code repositories. For example, we have one repo for our CSS styles and another for our Content Publisher page templates. We need to test the effect of those changes across both repos, not just individually. So triggering the tests based on a push to a single repository might mean we don't test the entire consequences of a change.

One way to solve this problem would be to rely on Pivotal Tracker, the project management tool we use for the Content Publisher. Pivotal's activity webhooks would potentially let us run our builds every time a story changes status – for example, once the reviewer delivers it. So this is definitely something we'll look into adding.

Running automated tests against static pages

Our Content Publisher generates static pages. This means that if we ever change the HTML structure of those pages, we need to regenerate the pages for those changes to take effect.

Manually republishing every piece of test content every time we wanted to run our tests wasn't an option.

Fortunately we'd already written a method to handle this sort of bulk republishing. Any time we make a design change and need to update our pages, we can run this in the Rails console to republish any pages which meet a certain condition (for example, they use the 'Guide' content type).

I added the "Republish all the test pages" stage to both our builds to run this method as a rails runner one-liner. This stage now republishes any already-live pages which we've flagged as being test content. One less thing to worry about!

Customising our Slack notification

Setting up a Slack notification from Bamboo is pretty straightforward. But we wanted to customise our notifications to link straight to the results.

After reading this helpful post about sending status reports from Slack to Bamboo, we wrote a shell script which checks to see if the tests had passed or not, and then sends the relevant notification via a webhook.

Slack notification showing the tests have passed
A custom Slack notification, including a link to the test report.

Continuing to improve our tests

Our visual regression tests are now working and integrated into our workflow.

It feels great to have all this in place, but there's still plenty of work to be done. Our to-do list now includes:

  • improving our range of test content
  • writing tests for individual elements, rather than entire pages
  • automatically triggering the tests, probably though Pivotal's activity webhooks
  • testing across a greater range of browsers

But now that we've got our essentials set up, making these improvements will hopefully be much more straightforward. And our team and workflow will be well-equipped to stop design bugs from happening before users ever get a chance to see them.

Posted in: Design, Development

Responses

  • (we won't publish this)

Write a response

  • Wow, this is really impressive - well done all!

    • Thanks Phil! Took a little longer than expected to get it all working, but I think the results will be worth it.

  • That was a really interesting post. I'm looking at visual regression testing at the moment and the bamboo information is handy.
    Well done.
    Haydn