Why Agencies Struggle with Existing Testing Platforms

All posts

Every agency I have worked at has had the same conversation about testing. It comes up after a release goes sideways, or a client asks why something they reported as fixed three months ago is broken again. Someone suggests a testing platform. A few tools get evaluated. Maybe one gets trialled for a sprint or two. Then it quietly stops being used, because nobody has the hours to maintain it alongside the actual deliverables.

The problem is not just structural. It is time. Agencies almost never have the budget or the breathing room to spec out proper testing. Automated testing tools sound great in principle, but they need the full context of what you are building to be useful, and maintaining the test code itself is a massive time sink. At every agency I have been part of, testing gets deprioritised because there is always something more urgent that is actually billable. The irony is that clients often want to see testing done. They just do not want to pay for the overhead of setting it up.

Time kills testing culture

The fundamental tension in agency work is that testing is an investment, and agencies bill for delivery. When a sprint is tight and the client is waiting for a demo, writing test cases for features that already work is the first thing that gets cut. Not because anyone thinks it is unimportant, but because it is never the most urgent thing in the room.

This creates a pattern where testing only happens reactively. Something breaks. Someone scrambles to verify the fix. Maybe they check a few related things while they are at it. But there is no systematic test suite being built up over time, because building one requires hours that the project budget does not account for.

The tools themselves make this worse. Automated testing platforms like Cypress or Playwright are powerful, and both ship recorders that can produce an initial test script without hand-writing code. The maintenance burden is where the cost sits. Once a test exists, it needs updating when features change, debugging when it fails for the wrong reasons, and someone who understands the full context of what it is supposed to verify. For a product company with dedicated QA engineers, that is manageable. For an agency juggling six client projects with two-week sprints, it is a luxury that never materialises.

Clients want testing, but the feedback loop is broken

Here is the pattern I have seen play out repeatedly. The client finds a bug. They report it in Slack, or during a standup, or by adding a comment to a Jira ticket. A developer picks it up, fixes it, marks the ticket as done. The client confirms it works. Everyone moves on.

Six weeks later, the same bug reappears. Nobody can find the fix from last time because it was buried in a sprint that has long since been archived. The context of why it broke and how it was resolved is scattered across Slack messages, ticket comments, and a commit message that says "fix export bug." Or, honestly, "all recent work." I have written that one myself more times than I would like to admit.

The deeper issue is that clients often help with testing, especially in UAT phases, but there is no easy way to collaborate on it. Their feedback gets shoehorned into a Jira or ClickUp ticket, eventually marked as done, and it never becomes a regression test. The knowledge of what broke and why evaporates. The next time that area of the codebase is touched, the team is starting from scratch.

Spec-based testing tools miss the context

Automated testing tools assume you already know exactly what to test. They give you a framework for writing assertions, but they do not help you figure out what assertions matter. That requires business context that lives in someone's head, or in a Slack thread from last October, or in a client brief that was accurate three months ago.

A developer knows what a component does technically. They know the API contract, the state management, the UI interactions. But do they know why the client wanted a three-step checkout instead of a single page? Do they know that the reporting dashboard is used by franchise owners who need larger touch targets? Do they know that the data export feature exists specifically because the client's accountant uses a legacy system that only accepts CSV?

Without that context, tests verify technical correctness without validating business intent. The feature works. The test passes. The client is still unhappy because the thing they actually needed was something subtler than what the test is checking.

AI coding agents have this problem even worse. They can generate technically valid test cases from a component's props and API surface, but they have no idea what the feature is for or who uses it. The tests they produce are structurally correct and functionally meaningless.

What would actually work

The fix is not a better testing tool bolted onto the side of your project management system. It is a testing approach that is connected to the specifications that define what you are building and why, and that does not require developers to maintain a parallel codebase of test scripts.

Tests should be linked directly to features, not to tasks. Features should carry their business context, acceptance criteria, and user stories in a structured format that both humans and AI agents can read. When a feature changes, the linked tests should surface as needing review. When a test run completes, the results should be shareable with the client in a format they can actually understand without needing a developer to walk them through it.

Client feedback should have a clear path into the test suite. When a client reports a bug during UAT, that feedback should become a test case that lives alongside the feature spec, not a ticket comment that gets archived and forgotten. The next time anyone touches that feature, the test is right there.

And the AI should do the heavy lifting. If the feature spec exists with proper acceptance criteria and business context, generating the test cases should not be a manual process. The AI reads the spec, understands what the feature is supposed to do, and drafts the tests. A human reviews and approves. That is minutes, not hours.

This is what we built Specsource around. Tests live alongside the feature specs that define what you are building, linked to the business context and acceptance criteria that make them meaningful. The AI drafts test cases from those specs so you are not starting from a blank page, and clients can see the results through shareable reports without needing a login to your internal tools.

Testing should not be a separate discipline that agencies aspire to and never quite achieve. It should be a natural byproduct of specifying what you are building clearly enough that testing becomes obvious.