Distributed Transactions in Microservices: implementing Saga with Temporal

One important design principle of microservices architecture is database per service pattern. This pattern helps us to keep services loosely coupled so that they can be developed, deployed, and scaled independently. In other words, the domain data is encapsulated within a microservice and if other services need data then they do so by calling the APIs.

This design principle brings one interesting challenge – coordinating writes as distributed transactions in microservices.

In a database management system, a transaction is a single unit of logic or work, sometimes made up of multiple operations

One example of such distributed business transaction is order fulfillment in an e-commerce application (discussed in the earlier article Workflow Orchestration with Temporal and Spring Boot). An order fulfillment workflow results in writes to different services such as Order Service, Payment Service, and Shipping Service. In a distributed transaction, either all steps of the transaction need to succeed or the transaction fails as a whole i.e., there is no intermediate state, for example, payment is debited but inventory is not reserved.

Implementing distributed transactions in a microservices application using a two-phase commit may not be an option because of many reasons, such as scalability concerns, complexity, no support in the services database, etc.

So, how do we implement transactions in a microservices application?

The answer is Saga!

What is Saga?

A Saga is a sequence of local transactions where each local transaction updates the database and publishes a message or event to trigger the next local transaction in the Saga. If a local transaction fails, the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

Here, “publishes a message or event” doesn’t mean that Saga always involves pub/sub or message bus. As we’ll see shortly, there are two ways to implement a Saga – Choreography and Orchestration. Chorography almost always involves messaging whereas Orchestration doesn’t need to.

As you can see in the above diagram, each microservice implements a compensating transaction corresponding to the steps of a business transaction.

There are two common approaches to implement a Saga – Choreography and Orchestration.

Choreography

In Choreography, the Saga participants exchange messages without a central point of control. Each participant publishes domain events, after completing the local transaction, and other participants react to the domain event and execute their local transactions.

In the case of failure, the Saga participant publishes compensating transaction event and other participants react accordingly and reverse the local transaction.

For our example use case, the choreography can look like this (compensation not shown for brevity):

Orchestration

In Orchestration, a centralized controller (Orchestrator) tells the Saga participants what local transaction to execute. The Saga Orchestrator manages saga requests, stores and interprets the states of each step, and handles failure recovery with compensating transactions.

For our example use case, the orchestration can look like this:

In the above diagram, the Order service works as a central controller (Orchestrator).

Implementing Saga with Temporal

Implementing Saga in a microservices application is not straightforward. Moreover, a Choreography-based Saga is much more complicated to implement than an Orchestration-based Saga. Thankfully, workflow engines like Temporal greatly simplify Saga implementation. In Temporal a Saga is implemented as Orchestration.

In the earlier article Workflow Orchestration with Temporal and Spring Boot, we have seen the implementation of distributed order fulfillment workflow (shown below) in Temporal.


public void createOrder(OrderDTO orderDTO) {
  paymentActivity.debitPayment(orderDTO);
  reserveInventoryActivity.reserveInventory(orderDTO);
  shipGoodsActivity.shipGoods(orderDTO);
  orderActivity.completeOrder(orderDTO);
}

The above workflow definition doesn’t have compensating transaction steps. Because of that, there can be data consistency issues in case of failure. For example, if the reserve inventory step errors out then payment should reverse, which is not the case.

To implement Saga in Temporal, all we need to do is to define compensating transactions as Activities and associate them with Saga object as:


public void createOrder(OrderDTO orderDTO) {
    // Configure SAGA to run compensation activities in parallel
    Saga.Options sagaOptions = new Saga.Options.Builder().setParallelCompensation(true).build();
    Saga saga = new Saga(sagaOptions);
    try {
      paymentActivities.debitPayment(orderDTO);
      saga.addCompensation(paymentActivities::reversePayment, orderDTO);
      //Inventory
      inventoryActivities.reserveInventory(orderDTO);
      saga.addCompensation(inventoryActivities::releaseInventory, orderDTO);
      //Shipping
      shippingActivities.shipGoods(orderDTO);
      saga.addCompensation(shippingActivities::cancelShipment, orderDTO);
      //Order
      orderActivities.completeOrder(orderDTO);
      saga.addCompensation(orderActivities::failOrder, orderDTO);
    } catch (ActivityFailure cause) {
      saga.compensate();
      throw cause;
    }
  }

In case of an error, Temporal runs all compensating transactions preceding the Activity step that threw the error. For example, if the execution of step shippingActivities.shipGoods(orderDTO) results in an error then the Temporal propagates that error to the Workflow implementation OrderFulfillmentWorkflowImpl. Once, the error is caught in the catch block, then saga.compensate() is called, which starts the compensation and runs all previously registered compensation. Throwing error throw cause results in failure of workflow. We can query the Workflow state using Temporal client SDK APIs or analyze the error stack trace in the UI.

In our code example, we have simulated the error condition by throwing the domain ServiceException in ShipmentServiceImpl class based on config parameter simulate_error=true as:


@Override
public String shipGoods(Double quantity) {
  // Simulate Error condition
  if (applicationProperties.isSimulateError() ) {
    log.error("Error occurred while shipping..");
    throw new ServiceException("Error executing Service");
  }
  UUID uuid = UUID.randomUUID();
  Thread.sleep(2000);
  return uuid.toString();
}

The ShipmentServiceImpl is called from ShippingActivitiesImpl as:


public void shipGoods(OrderDTO orderDTO) {
  log.info("Dispatching shipment,  order id {}", orderDTO.getOrderId());
  var trackingId = shipmentService.shipGoods(orderDTO.getQuantity());

  var shipment =
      Shipment.builder()
          .orderId(orderDTO.getOrderId())
          .productId(orderDTO.getProductId())
          .quantity(orderDTO.getQuantity())
          .trackingId(trackingId)
          .build();
  shipmentRepository.save(shipment);

  log.info("Created shipment for order id {}", orderDTO.getOrderId());
}

In our simple implementation of compensating transactions, we are just changing the status of the domain object Payment and Inventory. For example, PaymentActivities compensation reversePayment is implemented as:


@Override
public void reversePayment(OrderDTO orderDTO) {
  log.info("Reversing payment for order {}", orderDTO.getOrderId());
  var payment =
      paymentRepository
          .getByOrderId(orderDTO.getOrderId())
          .orElseThrow(() -> new ResourceNotFoundException("Order id not found"));
  payment.setPaymentStatus(PaymentStatus.REVERSED);
  paymentRepository.save(payment);
}

Obviously, such naive implementation is not for the production use case. In the production code, we may want to call the external payment service to reverse the payment and then change the status of the domain object and persist.

Code Example

You can find the working code example of this article on GitHub . To run the example, clone the repository, and import order-fulfillment-saga as a project in your favorite IDE as a Gradle project.

In this code example, we have implemented microservices using Spring Boot and their integration with Temporal using Temporal Java client SDK.

Code structure

You can find more information about code in README.md.

You can check the earlier article Workflow Orchestration with Temporal and Spring Boot to understand more about how to implement a workflow in Temporal.

Testing Saga

You can install Temporal locally by using Docker as,


git clone <https://github.com/temporalio/docker-compose.git>
cd  docker-compose
docker-compose up

Once Temporal Server is started, you can view Temporal UI by visiting http://localhost:8080

To build run ./gradlew clean build from the command line. You can start microservices from IDE or by running the jar.

Once microservices are running, to trigger the Workflow execution run below cURL.


curl --location --request POST 'localhost:8081/orders' \
--header 'Content-Type: application/json' \
--data-raw '{
  "productId": 30979484,
  "price": 28.99,
  "quantity": 2
}'

You can view the list of payments generated as part of workflow execution as:


curl --location --request GET 'localhost:8083/payments' \
--header 'Accept: application/json'

The response of the above cURL may look like this:


[
    {
        "paymentId": 1,
        "orderId": 1,
        "productId": 30979484,
        "amount": 57.98,
        "externalId": "de3cfdda-3afc-4110-8c2a-2bba904c94ac",
        "paymentStatus": "REVERSED"
    }
]

As you can see the payment status is changed from ACTIVE to REVERSED because of compensating transaction of the Saga.

Similarly, to view the list of inventory run the following cURL.


curl --location --request GET 'localhost:8082/inventories' \
--header 'Accept: application/json'

The response of the above cURL may look like the following.


[
    {
        "inventoryId": 1,
        "orderId": 1,
        "productId": 30979484,
        "quantity": 2.0,
        "inventoryStatus": "RELEASED"
    }
]

In Temporal UI, you can see failed workflow along with the reason and stack trace.

Workflow steps
Stack trace

Summary

  • Saga is a design pattern to implement distributed transactions in microservices.
  • A Saga is a sequence of local transactions where each local transaction updates the database and publishes a message or event to trigger the next local transaction in the Saga.
  • There are two common approaches to implement a Saga – Choreography and Orchestration.
  • In Choreography, there is no central coordinator whereas Orchestration involves a central coordinator.
  • In Temporal, a Saga is implemented as Orchestration.
Social Share !
Pankaj
Pankaj

Software Architect @ Schlumberger ``` Cloud | Microservices | Programming | Kubernetes | Architecture | Machine Learning | Java | Python ```