Usability testing is the practice of observing real people as they attempt to use a document, interface, or system — then revising based on where they succeed, hesitate, misread, or fail. In writing, usability testing treats a document as a tool the reader uses to accomplish something, and evaluates whether that tool works.

Janice Redish argued that written content should be tested the same way software is tested: through observation, not self-report. Readers who struggle with a document often can’t articulate what went wrong — they may blame themselves, skip sections, or misunderstand without knowing it. Watching them use the document reveals problems that surveys and self-assessment miss [@redish2012].

Karen Schriver’s protocol-aided audience analysis is the most rigorous form of usability testing for writing: readers think aloud while using a document, and the observer records where comprehension breaks down [@schriver1997]. The vault’s plain language specification includes two simplified versions:

  • The paraphrase test (section 12.2): ask a reader to explain a passage in their own words. If they can’t, revise.
  • The task test (section 12.3): give a reader a realistic task and watch them use the document to complete it. Note where they hesitate, reread, or fail.

Usability testing for writing differs from literary criticism or editorial review in a fundamental way: it measures whether the document works for its reader, not whether it’s well-crafted by the standards of its discourse community. A technically accurate document that readers can’t navigate is a usability failure, regardless of its prose quality.

In practice, usability testing reveals common failure patterns:

  • Readers don’t read sequentially. They scan headings, jump to what looks relevant, and backtrack when confused. Documents structured for front-to-back reading often fail scan-readers.
  • Readers skip what they think they already know. If a section title sounds familiar, they skip it — even if the content differs from their assumptions.
  • Readers misinterpret terminology. Terms that seem clear to the writer may carry different connotations for readers from different discourse communities.
  • Six to nine readers reveal most major problems. Small-sample testing is sufficient to identify structural and comprehension failures; large samples are needed only for statistical comparison between versions.
  • protocol-aided audience analysis — the most rigorous form of usability testing for documents
  • think-aloud protocol — the research method underlying protocol-aided testing
  • audience — usability testing is the feedback-based method of audience analysis
  • task analysis — usability testing validates whether task-structured content actually works
  • readability — readability formulas measure surface features; usability testing measures whether the document actually works
  • accessibility — usability testing with diverse users reveals accessibility failures