Deciding How to Test

2024-11-15

Something I've seen expressed time and time again is this feeling that a software codebase lacks testing and needs more tests. This is certainly always true in most companies so a respectable course of action might be to mandate that we include tests in every feature change and don't regress on code coverage metrics. But I think there isn't as much energy spent towards thinking about the usefulness of testing and whether we can write tests that have high ROI.

So what is a useful test anyways? By my definition, it is how close the test is to simulating how a real user would use the feature. In the ideal world, the best test that you could possibly do is send your product feature to a fleet of real users and have them manually try the feature and give feedback immediately. You cannot get any better than this. A test that is closer to this is, by definition, more useful. So, it follows that UI integration tests tend to rank higher in the usefulness scale as they test simulated user interactions on the UI. And unit tests in comparison are ranked lower since they test APIs that are pretty divorced from the end-user experience.

Obviously, we cannot rely on a fleet of real users to test every code change. And it is too costly to write UI integration tests for every possible branch of logic in some obscure part of the business logic. The testing approach is always a trade-off between usefulness and cost. The cost of testing I define to be a sum of:

Development cost: e.g. writing the test, setting up mocks, fixtures, services, etc.
Maintenance cost: e.g. maintaining test infrastructure, battling flakiness, ensuring tests are performant, etc.
Speed: e.g. how long it takes to run the tests.

So when choosing a testing approach, I believe the objective should be to find the most useful testing approach at a cost that is reasonably acceptable for the business context.

With this wisdom, I think it follows that you should probably be prioritizing some form of end-to-end integration testing as your main testing strategy and then supplement with more targeted unit testing where it starts to become costly. You can also employ this thinking to your unit tests where you opt to write more broadly specified unit tests with minimal mocking. For example, testing the API handler rather and mocking the network layer versus testing a helper function with dependent functions mocked out.

Unfortunately, I think some developers are still stuck in the dogma that unit tests must be narrowly specified and that it is better to have a ton of small tests independently exercising isolated parts of the codebase rather than broader tests that just test the higher level interface. This might work in some theoretically optimal codebase such as some rigorously typed pure functional codebase but in practice it makes your codebase frustrating to refactor and it produces tests that are not helpful in catching regressions for the end-user. A smell test to tell whether you have too many of these useless tests is if a test case fails and it is difficult to tell if you've impacted the end-user experience.

Partly to blame for this unit testing dogma is that we've decided to categorize tests into these arbitrary 'integration' and 'unit' labels. I've been in bikesheddy discussions about what constitutes a 'unit' in a unit test. Should it be a class? a function? an API handler? I've also argued with people who've noted that the type of broadly specified testing that I advocate for is not a 'true' unit test but an integration test as I am testing things outside of the perceived 'unit' - like making real database queries to a test database in a unit test. (For the record, I believe this is totally fine as long as the tests are fast and predictable). At this point we're just arguing semantics and our users don't care whether what I've written is the platonic ideal of a unit test.

In general, I don't like the term 'unit' since it comes with baggage around what you are and are not supposed to do and conversations start to veer into 'best practices' rather than what's good for the user. I would much prefer a categorization like a local in-memory test vs. distributed test because my testing approach is the same in both of these scenarios: prefer broad tests on the highest level interface possible that are reasonably cost efficient in terms time and eng cost. I guess you might just call these integration tests. Unit test 'best practices' begone.