The last time it was declared broken, it was Web browsing behaviour that totally invalidated ‘traditional’ usability testing: this was soon fixed with new metrics, but now there’s another elephant in the usability lab

Does anyone remember this gamechanging finding from research reported on in the 1999 classic ‘Web Site usability’ (J. Spool, et-al):

“Companies have been usability testing software for years.

Our firm alone has conducted thousands  of usability tests on hundreds of software and hardware products 

We assumed that web sites would be just another form of software, and could be tested similarly.

Boy were we wrong!

The web presents lots of problems that we’ve never seen before, which make it hard to define what usability even is, let alone measure it.” (the emphasis is mine).

The most spectacularly broken usability metric discovered here was ‘user preference’.

Up until this point, if the user in a comparative usability test told you that they liked using something that you were getting them to test a lot more than something else you were also getting them to test, your test results showed you that this was because they were able to use the thing that they liked most much more effectively and efficiently than the thing that liked least.

In testing jargon, ‘user preference was a reliable proxy for usability’.

But the surprising finding that was reported by these researchers was that with Web pages, the doors just came flying right off of this reliable old ‘user preference correlation’.

The websites that the users liked the best included ones where they got completely and irretrievably lost on the site.

The websites that they liked least included ones where they had no such problems and where they had successfully used the site exactly as the site owner and developer intended.

Many other previously well-understood vestiges of usability were uncannily absent in the test results.

Among the three ISO-specified parameters (ISO 9241-11) of usability, i.e., effectiveness, efficiency and satisfaction, some really important web sites were scoring something like this:

Effectiveness:0%

Efficiency: 0%

Satisfaction: 100%

Clearly, usability practitioners needed to go back to the drawing board.

They desperately needed to find out the implications of:

  • liking pages you couldn’t navigate or were confused by
  • liking web pages where you never managed to do whatever it was the website owner wanted you to do
  • not liking websites where you got things done exactly as intended by both yourself and the site owner/designer

The reasons why satisfaction parted company with usability in the case of web usage were simple:

  • measuring usability had been predicated on measuring it against intended use
  • experience, effectiveness and satisfaction were measures related to productivity and web surfing wasn’t

It turns out that the relationship between intent (on the part of either the provider or consumer of an experience) and the resulting experience itself, is complex in an environment where the ‘encounter with the experience’ is itself driven by factors other than ‘clearly defined intent’.

As if this wasn’t enough of a problem, the user may not have a very clearly conceived reason as to why they clicked on a link to a site and so when they arrive they might not actually know/remember or even (gasp!) care how or even (more gasps!) why they got there.

Anyway, that was then, this is now.

Usability labs have since added mobile and social apps to the range of things that they test, and the labs themselves are much more commonplace (today’s usability guru Jakob Nielsen has been largely responsible for promoting usability testing of websites).

But for some reason, the mobile and social aspects of whatever they happen to be testing are not being taken into account in terms of the way that the tests are being conducted.

Usability testers may be testing social apps (in many senses every Internet-connected app is social, and in practice, just about everything today is more socially mediated through technology than it was before) but testers are treating the tests as if they can be successfully conducted in a social vacuum, a test environment which is as innocent of the inevitable impact of the social context of the test subject as it was ignorant of the ‘intent issues’ that mystified the Web usability researchers of the late 1990s.

When you test someone’s interaction with your UI, who else is indirectly involved?

    1. Who else is going to influence (either consciously or subliminally) the decisions that the user is expected to make during the test (and also during a non-test user experience)?
    2. Which decisions will not be taken (e.g., optional fields left blank, choices abandoned, hedged, obfuscated) by the user if ‘consultation of the (intended/expected/habitual) co-decision maker’ is not possible, both in the test and in a non-test situation?
    3. What aspects of the test cause the user stress (manifested by hesitation, errors or distress) simply because the user would normally involve another person who is not present/cannot be consulted during the test, but who would have been consulted ‘in real life’?
    4. How is the social context of the test setting/timing/’state of the test subject’s life/relationships/work’ going to influence the user’s behaviour during the test/non-test situation?
    5. Who else is the user thinking about during the test? Is this different to who (and how, why and whether) they would have been thinking about (and behaving under the influence of/in the context of/in consideration of/in deference to/in defiance of) if this was not a test?
    6. Who else that the user knows will be impacted by the answers given in the test (irrespective of whether they would have been consulted/notified)?
    7. To what extent would the social context be different for that user if the experience being tested was not a test but was real?
    8. How are prospective/actual test subjects supposed to interact/not interact with one another before, during and after the test?
    9. Should the test subject be asked not to try to contact other test subjects?
    10. How will the ‘test taking event’ (and the run-up to it and follow up from it) impact upon the user’s social media communications?
    11. Will the user be expected to, allowed to, prevented from having social media access during the test?
    12. Will the user want to ‘consult someone using social media’ during the test in order to respond to questions in the test (irrespective of whether they are allowed to do this)?

Without taking any of these issues into consideration the following could result:

The test findings could be misleading in terms of the extent to which the experience being tested would be truly representative of the test subject and intended user existing in a comprehensively socially mediated environment.

In other words, the results of the test would only be meaningful in a world which bears no resemblance to the one in which either the test subject or the intended userbase exists.

Notice that ‘socially mediated’ in this context is not merely a reflection of the state of current technology. It seems to indicate that many (but obviously not all) of the concerns addressed in the questions above have been present for at least as long as usability testing has existed, but in the past they would have instead been restricted to less technologically sophisticated channels (like just talking to one another face to face).

This means that even the (embarrassing for such technology-sensitive practitioners as usability testers) excuse of ‘failure to keep up with modern technological developments’ is not sufficient to cover the iniquity of overlooking the fact that when you use a product, even when you decide to try to use it when nobody else is around, this experience will almost inevitably be something which is both impacted upon by and will impact upon your interactions with other people.

Somehow, usability testing has managed to overlook this. Broken, it is.