Getting Error Handling right in gRPC

Handling errors right can be tricky and it can be even trickier in gRPC. The current version of the gRPC only has limited built-in error handling based on simple status codes and metadata. In this article, we will see the limitations of gRPC error handling and how to overcome and build a robust error handling framework. In the next article, we will examine how to handle errors in RestFul APIs using Spring Boot.

Code Example

The working code example of this article is listed on GitHub . To run the example, clone repository, and import grpc-spring-boot as a project in your favorite IDE.

The code example consists of two microservices –

  • Product Gateway – acts as an API Gateway (client of Product Service) and exposes REST APIs (Gradle module product-api-gateway)
  • Product Service – exposes gRPC APIs (Gradle module product-service)

There is a 3rd Gradle module, called commons, which contains common exceptions consumed by both Product Gateway Service and Product Service.

You can start these services from IDE by calling the main method of ProductGatewayApplication and ProductApplication respectively.

You can test the application by calling Product Gateway Service API as :


curl --location --request GET 'http://localhost:8080/products/32c29935-da42-4801-825a-ac410584c281' \
--data-raw ''

Error handling in gRPC

By default, gRPC relies heavily on status code for error handling. But this approach has certain drawbacks. Let’s try to understand by example.

In our sample application, the server-side Product Service exposes a gRPC Service getProduct. This API fetches Product from ProductRepository and returns the response back to the client as:


public void getProduct(
    GetProductRequest request, StreamObserver<GetProductResponse> responseObserver)  {

  String productId = request.getProductId();
  var product = productRepository.get(productId);

  var response =
      GetProductResponse.newBuilder()
          .setName(product.getName())
          .setDescription(product.getDescription())
          .setPrice(product.getPrice())
          .setUserId(product.getUserId())
          .build();
  responseObserver.onNext(response);
  responseObserver.onCompleted();
  log.info("Finished calling Product API service..");
}

ProductRepository fetches data from productStorage and returns Product and throws an error if Product is not found as:

public Product get(String productId) {
  var product = Optional.ofNullable(productStorage.get(productId));
  return product.orElseThrow(() -> new ResourceNotFoundException("Product ID not found"));
}

You may argue that why do we need to throw a custom exception, why can’t we throw gRPC specific StatusRunTimeException as

product.orElseThrow(() -> Status.NOT_FOUND.withDescription("Product ID not found").asRuntimeException());

The biggest benefit is the separation of concern. You don’t want to pollute business logic with gRPC specific code, which belongs to the transport(API) layer.

The responsibility of the client application (Product Gateway Service) is to call the server application and convert the received response to the domain object. In case of error, it simply wraps the error in domain-specific exception, as ServiceException(error.getCause()), and throws to be handled upstream.


//Client call
public Product getProduct(String productId) {
  Product product = null;
  try {
    var request = GetProductRequest.newBuilder().setProductId(productId).build();
    var productApiServiceBlockingStub = ProductServiceGrpc.newBlockingStub(managedChannel);
    var response = productApiServiceBlockingStub.getProduct(request);
    // Map to domain object
    product = ProductMapper.MAPPER.map(response);
  } catch (StatusRuntimeException error) {
    log.error("Error while calling product service, cause {}", error.getMessage());
    throw new ServiceException(error.getCause());
  }
  return product;
}

Seems pretty straightforward, but there is one problem. In case of error, on the client-side, you’ll see –


io.grpc.StatusRuntimeException: UNKNOWN

Why do we see StatusRuntimeException with status as unknown?

gRPC wraps our custom exception ResourceNotFoundException in StatusRuntimeException and swallows the error message and assigns a default status code UNKNOWN.

We can improve error handling by catching ResourceNotFoundException in the server’s service and call responseObserver.onError(..) as:


//Server Product Service API
public void getProduct(
    GetProductRequest request, StreamObserver<GetProductResponse> responseObserver) {
  String productId = request.getProductId();
  try {
    var product = productRepository.get(productId);
    var response =
        GetProductResponse.newBuilder()
            .setName(product.getName())
            .setDescription(product.getDescription())
            .setPrice(product.getPrice())
            .setUserId(product.getUserId())
            .build();
    responseObserver.onNext(response);
    responseObserver.onCompleted();
  } catch (ResourceNotFoundException error) {
    log.error("Product id, {} not found", productId);
    var status = Status.NOT_FOUND.withDescription(error.getMessage()).withCause(error);
    responseObserver.onError(status.asException());
  }
  log.info("Finished calling Product API service..");
}

On the client-side, you will see:


Error while calling product service, cause NOT_FOUND: Product ID not found

You’ll notice that on the client-side you don’t get the original exception ResourceNotFoundException thrown by the server, so error.getCause() on the client is effectively returning null.


throw new ServiceException(error.getCause()); //error.getCause() is null

Why?

From official documentation of Status withCause(Throwable cause), cause is not transmitted from server to client.

Create a derived instance of Status with the given cause. However, the cause is not transmitted from server to client.

grpc-java documentation

Passing error metadata using gRPC Metadata

But what if you need to pass some error metadata information back to the client? For example, in our sample application, we may want to pass the id of the Product and standard error message when an error occurs. This can be done by using gRPC Metadata.


public Product get(String productId) {
  var product = Optional.ofNullable(productStorage.get(productId));

  return product.orElseThrow(
      () ->
          new ResourceNotFoundException(
              "Product ID not found",
              Map.of("resource_id", productId, "message", "Product ID not found")));
}

Fortunately, ResourceNotFoundException class has an overloaded constructor that takes additional errorMetadata as, ResourceNotFoundException(String message, Map<String, String> errorMetaData).

We can change Product Service API calls by catching ResourceNotFoundException and calling responseObserver.onError(statusRuntimeException) with additional metadata as:


public void getProduct(
    GetProductRequest request, StreamObserver<GetProductResponse> responseObserver) {
  String productId = request.getProductId();
  try {
    var product = productRepository.get(productId);
    var response =
        GetProductResponse.newBuilder()
            .setName(product.getName())
            .setDescription(product.getDescription())
            .setPrice(product.getPrice())
            .setUserId(product.getUserId())
            .build();
    responseObserver.onNext(response);
    responseObserver.onCompleted();
  } catch (ResourceNotFoundException error) {
    log.error("Product id, {} not found", productId);
    var errorMetaData = error.getErrorMetaData();
    var metadata = new Metadata();    
    errorMetaData.entrySet().stream() 
        .forEach(
            entry ->
                metadata.put(
                    Metadata.Key.of(entry.getKey(), Metadata.ASCII_STRING_MARSHALLER),
                    entry.getValue()));
    var statusRuntimeException =
        Status.NOT_FOUND.withDescription(error.getMessage()).asRuntimeException(metadata); 
    responseObserver.onError(statusRuntimeException);
  }
  log.info("Finished calling Product API service..");
}

Let’s understand what’s being done here.

  1. Get error metadata from our custom ResourceNotFoundException as error.getErrorMetaData().
  2. For each key-value pair of error-metadata, create a key as Metadata.Key.of(entry.getKey(), Metadata.ASCII_STRING_MARSHALLER).
  3. Store key-value pairs in metadata by calling metadata.put(Key,Value).
  4. Create StatusRuntimeException by passing metadata to Status.
  5. Call responseObserver to set error condition.

On the client-side, you can catch StatusRuntimeException and get Metadata from error as:


} catch (StatusRuntimeException error) {

  Metadata trailers = error.getTrailers();
  Set<String> keys = trailers.keys();

  for (String key : keys) {
    Metadata.Key<String> k = Metadata.Key.of(key, Metadata.ASCII_STRING_MARSHALLER);
    log.info("Received key {}, with value {}", k, trailers.get(k));
  }
}

In case of error, the above statement prints:


Received key Key{name='resource_id'}, with value 32c29935-da42-4801-825a-ac410584c281
Received key Key{name='content-type'}, with value application/grpc
Received key Key{name='message'}, with value Product ID not found

As you can see, it’s not clear which metadata is an error related as metadata can contain other information such as content-type (or trace information). For sure, you can define your own convention (for example appending all error metadata keys with err_).

There is another cleaner way to handle error metadata propagation.


Google Richer Error Model

The Google’s google.rpc.Status provides much richer error handling capabilities. This approach is used by Google APIs, but it’s not part of the official gRPC error model, yet. Internally, this still uses metadata but in a cleaner way. The google.rpc.Status is defined as:


package google.rpc;

// The `Status` type defines a logical error model that is suitable for
// different programming environments, including REST APIs and RPC APIs.
message Status {
  // A simple error code that can be easily handled by the client. The
  // actual error code is defined by `google.rpc.Code`.
  int32 code = 1;

  // A developer-facing human-readable error message in English. It should
  // both explain the error and offer an actionable resolution to it.
  string message = 2;

  // Additional error information that the client code can use to handle
  // the error, such as retry info or a help link.
  repeated google.protobuf.Any details = 3;
}

You must be aware of the gotcha associated with this approach, mainly it’s not supported by all language libraries and implementation may not be consistent across language.

The richness of error handling comes from ‘repeated google.protobuf.Any‘. From documentation –

`Any` contains an arbitrary serialized protocol buffer message along with a
URL that describes the type of the serialized message.

You can use Any to pack your arbitrary custom error models or use any of the predefined error_details.proto. Let’s see both of the approaches.

Using custom error model

Define your own custom error model as:


message ErrorDetail {
  // Error code
  string errorCode = 1;
  //Error message
  string message = 2;
  // Additional metadata associated with the Error
  map<string, string> metadata = 3;
}

On the server-side Product Service, build the ErrorInfo model and add to com.google.rpc.Status by calling .addDetails(Any.pack(errorStatus)) as:


//Catch Block
} catch (ResourceNotFoundException error) {
   log.error("Product id, {} not found", productId);
   var errorMetaData = error.getErrorMetaData();
   Resources.ErrorDetail errorInfo =
       Resources.ErrorDetail.newBuilder()
           .setErrorCode("ResourceNotFound")
           .setMessage(error.getMessage())
           .putAllMetadata(errorMetaData)
           .build();
   com.google.rpc.Status status =
       com.google.rpc.Status.newBuilder()
           .setCode(Code.NOT_FOUND.getNumber())
           .setMessage("Product id not found")
           .addDetails(Any.pack(errorInfo))
           .build();
   responseObserver.onError(StatusProto.toStatusRuntimeException(status));
 }

And, on the client-side Product Gateway Service, change catch block as:


//Catch Block
} catch (StatusRuntimeException error) {
   com.google.rpc.Status status = io.grpc.protobuf.StatusProto.fromThrowable(error);
   Resources.ErrorDetail errorInfo = null;
   for (Any any : status.getDetailsList()) {
     if (!any.is(Resources.ErrorDetail.class)) {
       continue;
     }
     errorInfo = any.unpack(Resources.ErrorDetail.class);
   }
   log.info(" Error while calling product service, reason {} ", errorInfo.getMessage());
   throw new ServiceException(errorInfo.getMessage(), errorInfo.getMetadataMap());
 }

Using pre-defined error model

Rather than defining your own error model, you can use predefined error models from error_details.proto. For example, you can use ErrorInfo defined as:


message ErrorInfo {

  // The reason of the error. This is a constant value that identifies the
  // proximate cause of the error. Error reasons are unique within a particular
  // domain of errors. This should be at most 63 characters and match
  // /[A-Z0-9_]+/.
  string reason = 1;

  // The logical grouping to which the "reason" belongs. The error domain
  // is typically the registered service name of the tool or product that
  // generates the error. Example: "pubsub.googleapis.com". If the error is
  // generated by some common infrastructure, the error domain must be a
  // globally unique value that identifies the infrastructure. For Google API
  // infrastructure, the error domain is "googleapis.com".
  string domain = 2;

  // Additional structured details about this error.
  // Keys should match /[a-zA-Z0-9-_]/ and be limited to 64 characters in
  // length. When identifying the current value of an exceeded limit, the units
  // should be contained in the key, not the value.  For example, rather than
  // {"instanceLimit": "100/request"}, should be returned as,
  // {"instanceLimitPerRequest": "100"}, if the client exceeds the number of
  // instances that can be created in a single (batch) request.
  map<string, string> metadata = 3;
}

On Server side Product Service, you can use com.google.rpc.ErrorInfo as:


} catch (ResourceNotFoundException error) {
  var errorMetaData = error.getErrorMetaData();
  ErrorInfo errorInfo =
       ErrorInfo.newBuilder()
           .setReason("Resource not found")
           .setDomain("Product")
           .putAllMetadata(errorMetaData)
           .build();
  com.google.rpc.Status status =
       com.google.rpc.Status.newBuilder()
           .setCode(Code.NOT_FOUND.getNumber())
           .setMessage("Product id not found")
           .addDetails(Any.pack(errorInfo))
           .build();
  responseObserver.onError(StatusProto.toStatusRuntimeException(status));
}

The only change in the client-side is to user compiled ErrorInfo class as:


//Catch Block
} catch (StatusRuntimeException error) {
   com.google.rpc.Status status = io.grpc.protobuf.StatusProto.fromThrowable(error);
   ErrorInfo errorInfo = null;
   for (Any any : status.getDetailsList()) {
     if (!any.is(ErrorInfo.class)) {
       continue;
     }
     errorInfo = any.unpack(ErrorInfo.class);
   }
   log.info(" Error while calling product service, reason {} ", errorInfo.getReason());
   throw new ServiceException(errorInfo.getReason(), errorInfo.getMetadataMap());
 }

Global Interceptor for error handling

The approach of catching and throwing exceptions in the server-side Product Service can quickly get very complex and clumsy. In the case of complex business logic, you may end up with code like catch (ResourceNotFoundException | ServiceException | OtherException error).

We can simplify this by using a gRPC interceptor. The interceptor catches such exceptions and processes them accordingly as:


public class GlobalExceptionHandlerInterceptor implements ServerInterceptor {

  @Override
  public <T, R> ServerCall.Listener<T> interceptCall(
      ServerCall<T, R> serverCall, Metadata headers, ServerCallHandler<T, R> serverCallHandler) {
    ServerCall.Listener<T> delegate = serverCallHandler.startCall(serverCall, headers);
    return new ExceptionHandler<>(delegate, serverCall, headers);
  }

  private static class ExceptionHandler<T, R>
      extends ForwardingServerCallListener.SimpleForwardingServerCallListener<T> {

    private final ServerCall<T, R> delegate;
    private final Metadata headers;

    ExceptionHandler(
        ServerCall.Listener<T> listener, ServerCall<T, R> serverCall, Metadata headers) {
      super(listener);
      this.delegate = serverCall;
      this.headers = headers;
    }

    @Override
    public void onHalfClose() {
      try {
        super.onHalfClose();
      } catch (RuntimeException ex) {
        handleException(ex, delegate, headers);
        throw ex;
      }
    }

    private void handleException(
        RuntimeException exception, ServerCall<T, R> serverCall, Metadata headers) {
      // Catch specific Exception and Process
      if (exception instanceof ResourceNotFoundException) {
        var errorMetaData = ((ResourceNotFoundException) exception).getErrorMetaData();
        // Build google.rpc.ErrorInfo
        var errorInfo =
            ErrorInfo.newBuilder()
                .setReason("Resource not found")
                .setDomain("Product")
                .putAllMetadata(errorMetaData)
                .build();

        com.google.rpc.Status rpcStatus =
            com.google.rpc.Status.newBuilder()
                .setCode(Code.NOT_FOUND.getNumber())
                .setMessage("Product id not found")
                .addDetails(Any.pack(errorInfo))
                .build();

        var statusRuntimeException = StatusProto.toStatusRuntimeException(rpcStatus);

        var newStatus = Status.fromThrowable(statusRuntimeException);
        // Get metadata from statusRuntimeException
        Metadata newHeaders = statusRuntimeException.getTrailers();

        serverCall.close(newStatus, newHeaders);
      } else {
        serverCall.close(Status.UNKNOWN, headers);
      }
    }
  }
}

Let’s understand what’s being done here –

  • First, create ExcepltionHandler, which overrides onHalfClose(), by extending from ForwardingServerCallListener.SimpleForwardingServerCallListener<T>.
  • The handleException(..) method first builds google.rpc.ErrorInfo and then adds ErrorInfo to com.google.rpc.Status, which internally builds the new metadata containing ErrorInfo.
  • As serverCall.close(status, newHeaders), takes io.grpc.Status we need to convert com.google.rpc.Status by calling Status.fromThrowable(statusRuntimeException)
  • Then all we need to do is call serverCall.close(status, newHeaders) with io.grpc.Status and new metadata.

The only change needed on the server-side service implementation of Product Service API is to remove catch block and exception processing logic as:


public void getProduct(
    GetProductRequest request, StreamObserver<GetProductResponse> responseObserver) {

  String productId = request.getProductId();
  var product = productRepository.get(productId);
  var response =
      GetProductResponse.newBuilder()
          .setName(product.getName())
          .setDescription(product.getDescription())
          .setPrice(product.getPrice())
          .setUserId(product.getUserId())
          .build();
  responseObserver.onNext(response);
  responseObserver.onCompleted();
}

On the client-side, there is no change i.e. we can get an instance of ErrorInfo class as errorInfo = any.unpack(ErrorInfo.class).

Using Spring Interceptor

If you can use grpc-spring-boot-starter then this greatly simplifies everything. All you need to do is to create a class and annotate that class with @GrpcAdvice and provide methods to handle the individual exception as:


@GrpcAdvice
public class ExceptionHandler {

  @GrpcExceptionHandler(ResourceNotFoundException.class)
  public StatusRuntimeException handleResourceNotFoundException(ResourceNotFoundException cause) {
    var errorMetaData = cause.getErrorMetaData();
    var errorInfo =
        ErrorInfo.newBuilder()
            .setReason("Resource not found")
            .setDomain("Product")
            .putAllMetadata(errorMetaData)
            .build();
    var status =
        com.google.rpc.Status.newBuilder()
            .setCode(Code.NOT_FOUND.getNumber())
            .setMessage("Resource not found")
            .addDetails(Any.pack(errorInfo))
            .build();
    return StatusProto.toStatusRuntimeException(status);
  }
}

This approach is similar to Spring error handling. You just need to define a method with annotation @GrpcExceptionHandler, for example @GrpcExceptionHandler(ResourceNotFoundException.class), for the specific error condition. That’s it, no other change is needed on the server-side.

Summary

Getting error handling right can be very tricky in gRPC. Officially, gRPC heavily relies on status codes and metadata to handle errors. We can use gRPC metadata to pass additional error metadata from server application to client application. The Google’s google.rpc.Status provides much richer error handling capabilities but it’s not fully supported in all the languages. It’s possible to define a global gRPC interceptor to handle all error conditions centrally. The spring boot wrapper library yidongnan/grpc-spring-boot-starter provides a much cleaner approach to handle the error.

Further Reading

Social Share !
Default image
Pankaj
Software Architect @ Schlumberger ``` Cloud | Microservices | Programming | Kubernetes | Architecture | Machine Learning | Java | Python ```