Resiliency Patterns - Let's Code KnownSense

Last Updated on October 6, 2023 by KnownSense

Even though a lot of work goes into making a microservices app, it can still have problems. It’s the job of the development teams to make sure the app can handle these problems well. Different apps might need different ways to handle problems, but one common way is to use microservices resiliency patterns.

In Microservices Design Principle: Resiliency, we already show how to develop a resilient system. In this article, we will show you how and when to use resiliency patterns to stop one problem from causing many problems in a microservices app.

Circuit Breaker pattern

The Circuit Breaker pattern in microservices is the design pattern used to enhance the resilience and stability of a distributed system. It works like an electrical circuit breaker, preventing the system from making calls to a service that is experiencing issues, such as high latency or failure, for a predefined period. This helps in avoiding resource wastage and further overloading of the troubled service, allowing it time to recover.

The circuit have three states: Closed, Open and Half-Open.

Closed: In this state, the circuit allows requests to flow through as usual, monitoring for failures.
Open: When a predefined failure threshold is reached, the circuit “opens,” blocking requests to the troubled service to prevent further strain.
Half-Open: After a set time, the circuit transitions to this state, allowing a limited number of test requests to check if the troubled service has recovered before fully reopening or remaining open.

Use the Circuit Breaker pattern when:

Dealing with external services: It’s particularly useful when interacting with external services or APIs to prevent cascading failures in your application.
Handling latency: When you want to reduce the impact of slow responses on the overall system performance.
Enhancing fault tolerance: To make your microservices more resilient by isolating failures and providing graceful degradation.

Do not use the Circuit Breaker pattern when:

Overhead is a concern: If the system is very lightweight and the cost of implementing the pattern outweighs the benefits.
Real-time requirements: In cases where real-time responsiveness is crucial, as the Circuit Breaker introduces some delay in handling failures.
Limited fault tolerance: If your system cannot tolerate any downtime or reduced functionality, the Circuit Breaker may not be the best fit.

Circuit Breaker pattern is a valuable tool for enhancing the resilience of microservices in scenarios where failures need to be isolated and managed. However, it should be used judiciously, considering the specific requirements and constraints of your application.

Bulkhead Pattern

The Bulkhead Pattern in microservices is a design principle aimed at isolating and compartmentalizing different components of a system to prevent failures in one part from causing failures in others. It draws its name from the compartments (bulkheads) in a ship that prevent water from flooding the entire vessel if one section is breached.

Use the Bulkhead Pattern when:

Fault Isolation: You want to isolate failures within a microservice or component, ensuring that issues in one part of the system do not cascade and affect other parts.
Resilience: To enhance the overall resilience of your system by minimizing the impact of failures and preventing a single failure from bringing down the entire system.
Resource Allocation: When you need to allocate resources (like threads or connections) separately for different functions within a microservice, ensuring one function’s resource usage doesn’t starve others.

Do not use the Bulkhead Pattern when:

Simple, Single-Function Services: In simple microservices with only one primary function, the added complexity of bulkheading may not be necessary and can introduce overhead.
Infinite Isolation: Overly aggressive bulkheading can lead to excessive resource consumption, complexity, and difficulties in managing and monitoring the system.
Extreme Real-Time Requirements: For systems with extreme real-time requirements, the added latency and complexity introduced by bulkheading may not be acceptable.

Bulkhead Pattern is a valuable approach for enhancing fault tolerance and resilience in microservices but should be employed judiciously based on the specific complexity and requirements of your application.

Retry Pattern

The Retry Pattern in microservices is a strategy used to handle transient failures by automatically reattempting failed operations. When a service encounters a temporary issue, such as a network hiccup or a brief unavailability of a dependent service, the Retry Pattern helps by retrying the operation a predefined number of times, with delays in between each attempt.

Use the Retry Pattern when:

Transient Issues: You encounter transient failures that are likely to resolve themselves with time, such as network glitches or momentary service unavailability.
Reducing Manual Intervention: To automate error recovery and reduce the need for manual intervention in handling routine, recoverable failures.
Load Balancing: In load balancing scenarios, retries can distribute requests more evenly across multiple instances of a service.

Do not use the Retry Pattern when:

Permanent Failures: It’s essential to differentiate between transient and permanent failures. If a failure is non-transient, retrying will not resolve the issue, and it’s better to handle it differently.
Increasing Load: Excessive retries can increase the load on services and exacerbate issues, especially if they are resource-intensive or have limited capacity.
Timeout-sensitive: For operations with strict time constraints, like real-time processing, retries might not be suitable as they can introduce unwanted delays.

The Retry Pattern is valuable for handling transient failures and automating error recovery but should be applied judiciously, considering the nature of failures and the impact of retries on system load and responsiveness.

Timeout Pattern

The Timeout Pattern in microservices involves setting a specific time limit for a service operation to complete. If the operation exceeds this limit, it’s considered a timeout, and the system can respond accordingly.

Use the Timeout Pattern when:

Response Guarantees: You need guarantees on the maximum time it takes to receive a response from a service, ensuring that slow or unresponsive services don’t impact your system’s overall performance.
Graceful Degradation: To gracefully degrade service quality when a component is slow or unresponsive, allowing other parts of the system to continue functioning without waiting indefinitely.
Resource Management: To manage finite resources efficiently by preventing resources from being locked by a single slow operation.

Do not use the Timeout Pattern when:

Critical Real-Time Constraints: For operations with strict real-time requirements, setting timeouts might not be suitable, as they can lead to potential data loss or service disruption.
Complex Error Handling: If the error handling for timeout scenarios introduces unnecessary complexity or poses security risks, consider alternative patterns.
Misleading Metrics: When setting short timeouts, be cautious, as they can lead to inaccurate performance metrics if they result in frequent timeouts, making it difficult to distinguish between genuine problems and configured timeouts.

Timeout Pattern is valuable for controlling response times and ensuring the resilience of microservices but should be used with consideration of the specific operational requirements and potential impacts on system behavior.

Rate Limiter

A Rate Limiter in microservices is a mechanism used to control the rate or frequency at which incoming requests are processed by a service or API. It helps prevent overloading, abuse, or excessive resource consumption by limiting the number of requests a client can make within a specified time frame.

Use the Rate Limiter when:

Protecting Resources: You need to protect your microservice or API from being overwhelmed by a high volume of incoming requests, ensuring fair resource allocation.
Rate-Based Billing: When billing or monetization is based on the number of requests, a rate limiter can enforce usage limits and prevent abuse.
Stabilizing Systems: To stabilize systems during traffic spikes or distributed denial of service (DDoS) attacks by capping the rate of incoming requests.

Do not use the Rate Limiter when:

Real-Time Responsiveness: For real-time applications where immediate response is critical, a rate limiter might introduce unwanted delays.
Complex Authorization Logic: When the authorization logic is complex and dependent on multiple factors beyond just request frequency, a rate limiter may not be sufficient for access control.
Bypassing Security: If not appropriately configured or monitored, rate limiting can be bypassed by sophisticated attackers, so it should not be relied upon as the sole security measure.

Rate Limiter is a useful tool for managing request rates and protecting microservices but should be applied thoughtfully, considering the specific needs and security requirements of your application.

Caching

The Caching Pattern in microservices involves storing and retrieving frequently accessed data in a cache, which is a high-speed, temporary storage location. This pattern is used to reduce latency, minimize the load on underlying services, and improve overall system performance by serving data quickly from the cache when possible.

Use the Caching Pattern when:

High Data Access Frequency: Data that is frequently accessed and relatively static benefits from caching to avoid redundant and time-consuming database or service calls.
Latency Sensitivity: To decrease response times in latency-sensitive applications, such as real-time analytics or content delivery.
Load Balancing: Caching helps distribute the load and reduce stress on backend services during traffic spikes or heavy usage.

Do not use the Caching Pattern when:

Data Consistency: When data must remain highly consistent and up-to-date across all requests, as caching introduces a risk of serving outdated information.
Complex Data Invalidation: In situations where the complexity of cache data invalidation surpasses the benefits of caching.
Resource Constraints: In cases of resource-constrained environments, like microservices running in memory-limited containers, excessive caching can consume valuable memory.

Caching Pattern is a powerful tool for improving microservices’ performance and reducing latency, but its application should consider data consistency requirements and resource constraints to strike the right balance between performance and accuracy.

Resiliency patterns libraries & frameworks

To put these strategies into action in Spring Boot or microservices, we can use different libraries & frameworks in our code. One well-known Java library for creating robust and failure-resistant applications is Resilience4j. It comes with features such as circuit breakers, rate limiters, retries, and timeouts, making it easier to build reliable applications. Additionally, there are other libraries like Hystrix for implementing circuit breakers. To separate components effectively, you can use the Bulkhead approach with thread pools. If you need to retry an operation, you can use Spring Retry. For setting time limits on operations, consider using RestTemplate or WebClient for the Timeout Pattern. Lastly, if you want to control how fast requests are made, libraries like Google Guava or Redis can help with rate limiting.

Conclusion

This page discusses various microservices resiliency patterns and when to use them. It explains the Circuit Breaker pattern, Bulkhead Pattern, Retry Pattern, Timeout Pattern, Rate Limiter, and Caching Pattern. Each pattern is described along with its use cases and when not to use it. These patterns help enhance the reliability and performance of microservices by addressing specific challenges related to failures, resource allocation, and response times.