Design for Success, Not Failure: Error Handling

2021-04-07

When writing code, readability should be optimized for the success case rather than the rare failure case. You've might have seen this before - code littered with layers and layers of exception handling or defensive if statements, all in the name of resilience and fault tolerance. In my experience, this approach to writing code can make systems unpleasant to work in since the core logic is obfuscated with layers of exception handling blocks and if guards. Too much exception handling can also insidiously hide real issues if they do occur - the original error may be so obfuscated that you may lose visibility into what actually happened and your layers of exception handling may even start to create implicit dependencies with each other if your codebase is huge and complicated.

Only Handle Known Errors

Exception handling should be treated like performance optimization. When writing code, the common advice is to delay performance optimization until it is needed and initially optimize for readability. When optimization is needed, you would perform measurements to make sure they are worth doing. This is because doing performance optimization from the beginning could make your code hard to understand which is often the biggest bottleneck for productivity, and you need to take everything into account holistically to make sure you are optimizing the highest impact thing. Similarly, for exception handling, you should delay it until you absolutely need it such as when you want to implement some graceful failure behaviour, and you should make sure you are only handling known errors, not speculatively handling unknown errors.

One of the pitfalls of handling unknown errors is that you can end up hiding stupid bugs like this:

try:
  array[0
except Exception:
  print("Failed to index array");

The code that is trying to index an array is surrounded with an exception handler that catches everything. However there is a typo. Python will throw a SyntaxError exception which is caught by the exception handler and the problem may never be known until problems manifest indirectly in another part of the codebase. The poor developer trying to debug this may spend hours trying to figure out the source of the issue if the syntax error is deeply nested somewhere in the callstack. This has actually happened to me many times at work where overzealous exception handling like this cost me several hours of my life.

Instead, what the author may have intended was to catch errors where an index might not have existed:

try:
  array[0
except IndexError:
  print("Failed to index array");

This would have let the SyntaxError propagate which would have allowed me to identify the root cause a lot faster.

You want to be selective about which errors you catch. The only reason to try-catch a piece of code is if you are trying to catch a specific error that could potentially be thrown. This could be a request failure error that might occur when you're making a network request, or a parse error when you're parsing user input. You never want to be try-catching something that you don't understand or are not aware of since it will obfuscate the deeper issue at hand if those types of errors occur. Your users will suffer for it as the error may manifest itself indirectly in ways that are hard to understand and debug.

Use Root Handlers to Catch Unknowns or Knowns That You've Missed

Unknown errors can still crash your program and you may have missed adding exception handling to some known errors. This is not ideal for software that needs to self recover such as a UI or a web server. What do you do with errors that are not covered by your existing error handlers? You should define a root error handler.

The root error handler allows you to have a centralized place to handle the error. It's scope is usually broad as it's defined somewhere high up in the call stack where it can catch most errors if not all of them. The root error handler can stop errors from propagating further if you don't want to completely crash your program or if there is a way to gracefully recover from the error such as showing a generic "Something went wrong" error message to the user in lieu of an intimidating stack trace.

Root error handling manifests itself in many ways. For example, in React, you can define an ErrorBoundary component to handle errors thrown at render time by children components. This is pretty important to avoid the blank white screen of death that was pretty common in a lot of React apps before TypeScript was a thing since it was pretty easy to crash the UI because of 'cannot read property of null/undefined' errors. You can place an ErrorBoundary component at the root of your React application to handle all the unknown errors and notify the user that something unexpected had occurred. And in this component, you can add instrumentation to log the errors to your monitoring system.

Another example of root error handling is in Node HTTP request handlers. Because of the asynchronous design of Node request handlers, you need to make sure you are explicitly closing the request even when an error is thrown otherwise the HTTP client could be stuck hanging forever. For example, if one of the functions below throws an error before .end() is called, the HTTP request could hang indefinitely:

http.createServer(async (req, res) => {
  const data = await getData();
  res.writeHead(200, { "Content-Type": "application/json" });
  res.end(JSON.stringify({ data }));
});

Most Node web frameworks wrap this with a root error handler to either stop the server entirely (to avoid memory leaks), or by responding with a 500.

http.createServer(
  withErrorHandler(async (req, res) => {
    const data = await getData();
    res.writeHead(200, { "Content-Type": "application/json" });
    res.end(JSON.stringify({ data }));
  })
);

function withErrorHandler(requestHandler) {
  return (req, res) => {
    try {
      requestHandler(req, res);
    } catch (e) {
      res.writeHead(500, { "Content-Type": "application/json" });
      res.end(JSON.stringify({ error: "Internal server error" }));
    }
  };
}

So there are many types of root error handlers and software can have multiple root error handlers, usually at network and process boundaries.

Consider Modeling Error as Data

One way to handle errors is to stop treating them as exceptional. There are scenarios where certain errors are actually expected and are part of the happy path of your code. Parsing and validating user input is maybe the most classical example. Consider this function that parses user input:

function parseUserInput(input: string): string;

In the case where this function fails, the only way this function could signal failure is by throwing an exception since its function signature doesn't allow it to signal failure. However writing a try-catch every time you call this function can be arduous and since it's not documented anywhere in the function signature that this could fail, you could forget to do it as the compiler won't remind you. We don't want parse failures propagating to the root error handler since this error could be considered to be non-exceptional and an expected part of handling user input.

Instead, we can just model error as data:

type Ok<T> = [error: false, data: T];
type Err = [error: true, data: undefined];
type Result<T> = Err | Ok<T>;

function parseUserInput(input: string): Result<string> {
  let result;
  try {
    result = parse(input);
  } catch (e) {
    if (e instanceof ParseError) {
      return [true, undefined];
    }
    throw e;
  }
  return [false, result];
}

This is inspired by the Go error handling pattern by returning a tuple with one of the fields representing the error and the other field, the result. It is a tagged union where the caller would need to check the first value of the tuple for TypeScript's type system to discriminate between the success and failure tuple.

const [error, parsedInput] = parseUserInput("example input");

if (error) {
  const x: undefined = parsedInput;
} else {
  const x: string = parsedInput;
}

Treating error as data is especially valuable when a type system is involved since it forces you to handle the error scenario and you can't forget to handle error cases without the compiler complaining about it. This is a common technique that comes from functional programming as seen in languages like Haskell with it's Either type and Scala with it's Try class. In TypeScript, I've been using the neverthrow library which is a package that provides the Result type. You can wrap functions that throw and turn them into functions that return error as data:

const safeParse = Result.fromThrowable(parse, () => "Parse failed");
const result = safeParse("example input");

if (result.isErr()) {
  console.error(result.error);
} else {
  console.log(result.value);
}

It pretty much uses the tagged union I had shown earlier with the tuples but with better ergonomics and useful utility functions such as ok and err data constructors for the Result type and the ability to map the error in Result so that you can transform it to a different format.

Takeaways

Don't be overzealous with error handling - it hurts readability and makes the codebase a slog to work in.
Only handle known errors and let unknown errors propagate.
Root error handlers are useful for logging or gracefully recovering from unknown errors or known errors that you've missed adding error handling for.
Consider treating non-exceptional errors as data using functional programming patterns such as Either, Try or Result.
For most cases, just let the errors flow to the root handler.