<doc-view>

<h2 id="_contents">Contents</h2>
<div class="section">
<ul class="ulist">
<li>
<p><router-link to="#_overview" @click.native="this.scrollFix('#_overview')">Overview</router-link></p>

</li>
<li>
<p><router-link to="#maven-coordinates" @click.native="this.scrollFix('#maven-coordinates')">Maven Coordinates</router-link></p>

</li>
<li>
<p><router-link to="#_api" @click.native="this.scrollFix('#_api')">API</router-link></p>

</li>
<li>
<p><router-link to="#_configuration" @click.native="this.scrollFix('#_configuration')">Configuration</router-link></p>

</li>
<li>
<p><router-link to="#_examples" @click.native="this.scrollFix('#_examples')">Examples</router-link></p>

</li>
<li>
<p><router-link to="#_additional_information" @click.native="this.scrollFix('#_additional_information')">Additional Information</router-link></p>

</li>
</ul>

</div>


<h2 id="_overview">Overview</h2>
<div class="section">
<p>Helidon SE Fault Tolerance support is inspired by <a target="_blank" href="https://download.eclipse.org/microprofile/microprofile-fault-tolerance-4.0/microprofile-fault-tolerance-spec-4.0.html">MicroProfile Fault Tolerance</a>.
The API defines the notion of a <em>fault handler</em> that can be combined with other handlers to
improve application robustness. Handlers are created to manage error conditions (faults)
that may occur in real-world application environments. Examples include service restarts,
network delays, temporal infrastructure instabilities, etc.</p>

<p>The interaction of multiple microservices bring some new challenges from distributed systems
that require careful planning. Faults in distributed systems should be compartmentalized
to avoid unnecessary service interruptions. For example, if comparable information can
be obtained from multiples sources, a user request <em>should not</em> be denied when a subset
of these sources is unreachable or offline. Similarly, if a non-essential source has been
flagged as unreachable, an application should avoid continuous access to that source
as that would result in much higher response times.</p>

<p>In order to tackle the most common types of application faults, the Helidon SE Fault Tolerance API
provides support for circuit breakers, retries, timeouts, bulkheads and fallbacks.
In addition, the API makes it very easy to create and monitor asynchronous tasks that
do not require explicit creation and management of threads or executors.</p>

<p>For more information the reader is referred to the
<a target="_blank" href="./apidocs/io.helidon.reactive.faulttolerance/module-summary.html">Fault Tolerance SE API Javadocs</a>.</p>

</div>


<h2 id="maven-coordinates">Maven Coordinates</h2>
<div class="section">
<p>To enable Fault Tolerance
add the following dependency to your project&#8217;s <code>pom.xml</code> (see
 <router-link to="/about/managing-dependencies">Managing Dependencies</router-link>).</p>

<markup
lang="xml"

>&lt;dependency&gt;
    &lt;groupId&gt;io.helidon.reactive.fault-tolerance&lt;/groupId&gt;
    &lt;artifactId&gt;helidon-reactive-fault-tolerance&lt;/artifactId&gt;
&lt;/dependency&gt;</markup>

</div>


<h2 id="_api">API</h2>
<div class="section">
<p>The SE Fault Tolerance API is <em>reactive</em> in order to fit the overall processing model in
Helidon SE. A task returns either a <code>Single&lt;T&gt;</code> or a <code>Multi&lt;T&gt;</code>.
A <code>Single&lt;T&gt;</code> is a promise to produce zero or one value of type <code>T</code> or signal an error;
while a <code>Multi&lt;T&gt;</code> is a promise to produce zero or more values of type <code>T</code> or signal an error.</p>

<div class="admonition note">
<p class="admonition-inline">A <code>Single&lt;T&gt;</code>, like <code>CompletableFuture&lt;T&gt;</code>, extends <code>CompletionStage&lt;T&gt;</code>
so conversion among these types is straightforward.</p>
</div>

<p>In the sections that follow, we shall briefly explore each of the constructs provided
by this API.</p>


<h3 id="_asynchronous">Asynchronous</h3>
<div class="section">
<p>Asynchronous tasks can be created or forked by using an <code>Async</code> instance. A supplier of type
<code>T</code> is provided as the argument when invoking this handler. For example:</p>

<markup
lang="java"

>Single&lt;Thread&gt; s = Async.create().invoke(() -&gt; Thread.currentThread()));
s.thenAccept(t -&gt; System.out.println("Async task executed in thread " + t));</markup>

<p>The supplier <code>() &#8594; Thread.currentThread()</code> is executed in a new thread and
the value it produces printed by the consumer and passed to <code>thenAccept</code>.</p>

<div class="admonition note">
<p class="admonition-inline">The method reference <code>Thread::currentThread</code> is a simplified way of
providing a supplier in the example above.</p>
</div>

<p>Asynchronous tasks are executed in a thread pool managed by the Helidon SE
Fault Tolerance module. Thread pools are created during the initialization
phase of class <code>io.helidon.reactive.faulttolerance.FaultTolerance</code> and can be
configured for your application.</p>

</div>


<h3 id="_retries">Retries</h3>
<div class="section">
<p>Temporal networking problems can sometimes be mitigated by simply retrying
a certain task. A <code>Retry</code> handler is created using a <code>RetryPolicy</code> that
indicates the number of retries, delay between retries, etc.</p>

<markup
lang="java"

>Retry retry = Retry.builder()
                   .retryPolicy(Retry.JitterRetryPolicy.builder()
                                     .calls(3)
                                     .delay(Duration.ofMillis(100))
                                     .build())
                   .build();
retry.invoke(this::retryOnFailure);</markup>

<p>The sample code above will retry calls to the supplier <code>this::retryOnFailure</code>
for up to 3 times with a 100 millisecond delay between them.</p>

<div class="admonition note">
<p class="admonition-inline">The return type of method <code>retryOnFailure</code> in the example above must
be <code>CompletionStage&lt;T&gt;</code> and the parameter to the retry handler&#8217;s <code>invoke</code>
method <code>Supplier&lt;? extends CompletionStage&lt;T&gt;&gt;</code>.</p>
</div>

<p>If the <code>CompletionStage&lt;T&gt;</code> returned by the method completes exceptionally,
the call will be treated as a failure and retried until the maximum number
of attempts is reached; finer control is
possible by creating a retry policy and using methods such as
<code>applyOn(Class&lt;? extends Throwable&gt;&#8230;&#8203; classes)</code> and
<code>skipOn(Class&lt;? extends Throwable&gt;&#8230;&#8203; classes)</code> to control those exceptions
on which to act and those that can be ignored.</p>

</div>


<h3 id="_timeouts">Timeouts</h3>
<div class="section">
<p>A request to a service that is inaccessible or simply unavailable should be bounded
to ensure a certain quality of service and response time. Timeouts can be configured
to avoid excessive waiting times. In addition, a fallback action can be defined
if a timeout expires as we shall cover in the next section.</p>

<p>The following is an example of using <code>Timeout</code>:</p>

<markup
lang="java"

>Single&lt;T&gt; s = Timeout.create(Duration.ofMillis(10)).invoke(this::mayTakeVeryLong);
s.handle((t, e) -&gt; {
    if (e instanceof TimeoutException) {
        // Invocation has timed out!
    }
    //...
});</markup>

<p>The example above monitors the call to method <code>mayTakeVeryLong</code> and reports a
<code>TimeoutException</code> if the execution takes more than 10 milliseconds to complete.</p>

</div>


<h3 id="_fallbacks">Fallbacks</h3>
<div class="section">
<p>A fallback to a <em>known</em> result can sometimes be an alternative to
reporting an error. For example, if we are unable to access a service
we may fall back to the last result obtained from that service.</p>

<p>A <code>Fallback</code> instance is created by providing a function that takes a <code>Throwable</code>
and produces a <code>CompletionStage&lt;T&gt;</code> as shown next:</p>

<markup
lang="java"

>Single&lt;T&gt; single = Fallback.create(
    throwable -&gt; Single.just(lastKnownValue).invoke(this::mayFail);
single.thenAccept(t -&gt; {
    //...
});</markup>

<p>In this example, we register a function that can produce a <code>Single&lt;T&gt;</code> (which implements
<code>CompletionStage&lt;T&gt;</code>) if the call to <code>this::mayFail</code> completes exceptionally.</p>

</div>


<h3 id="_circuit_breakers">Circuit Breakers</h3>
<div class="section">
<p>Failing to execute a certain task or call another service repeatedly can have a direct
impact on application performance. It is often preferred to avoid calls to non-essential
services by simply preventing that logic to execute altogether. A circuit breaker can be
configured to monitor such calls and block attempts that are likely to fail, thus improving
overall performance.</p>

<p>Circuit breakers start in a <em>closed</em> state, letting calls to proceed normally; after
detecting a certain number of errors during a pre-defined processing window, they can <em>open</em> to
prevent additional failures. After a circuit has been opened, it can transition
first to a <em>half-open</em> state before finally transitioning back to a closed state.
The use of an intermediate state (half-open)
makes transitions from open to close more progressive, and prevents a circuit breaker
from eagerly transitioning to states without considering "sufficient" observations.</p>

<div class="admonition note">
<p class="admonition-inline">Any failure while a circuit breaker is in half-open state will immediately
cause it to transition back to an open state.</p>
</div>

<p>Consider the following example in which <code>this::mayFail</code> is monitored by a
circuit breaker:</p>

<markup
lang="java"

>CircuitBreaker breaker = CircuitBreaker.builder()
                                       .volume(10)
                                       .errorRatio(30)
                                       .delay(Duration.ofMillis(200))
                                       .successThreshold(2)
                                       .build();
Single&lt;T&gt; result = breaker.invoke(this::mayFail);</markup>

<p>The circuit breaker in this example defines a processing window of size 10, an error
ratio of 30%, a duration to transition to half-open state of 200 milliseconds, and
a success threshold to transition from half-open to closed state of 2 observations.
It follows that,</p>

<ul class="ulist">
<li>
<p>After completing the processing window, if at least 3 errors were detected, the
circuit breaker will transition to the open state, thus blocking the execution
of any subsequent calls.</p>

</li>
<li>
<p>After 200 millis, the circuit breaker will transition back to half-open and
enable calls to proceed again.</p>

</li>
<li>
<p>If the next two calls after transitioning to half-open are successful, the
circuit breaker will transition to closed state; otherwise, it will
transition back to open state, waiting for another 200 milliseconds
before attempting to transition to half-open again.</p>

</li>
</ul>

<p>A circuit breaker will throw a
<code>io.helidon.reactive.faulttolerance.CircuitBreakerOpenException</code>
if an attempt to make an invocation takes place while it is in open state.</p>

</div>


<h3 id="_bulkheads">Bulkheads</h3>
<div class="section">
<p>Concurrent access to certain components may need to be limited to avoid
excessive use of resources. For example, if an invocation that opens
a network connection is allowed to execute concurrently without
any restriction, and if the service on the other end is slow responding,
it is possible for the rate at which network connections are opened
to exceed the maximum number of connections allowed. Faults of this
type can be prevented by guarding these invocations using a bulkhead.</p>

<div class="admonition note">
<p class="admonition-inline">The origin of the name <em>bulkhead</em> comes from the partitions that
comprise a ship&#8217;s hull. If some partition is somehow compromised
(e.g., filled with water) it can be isolated in a manner not to
affect the rest of the hull.</p>
</div>

<p>A waiting queue can be associated with a bulkhead to handle tasks
that are submitted when the bulkhead is already at full capacity.</p>

<markup
lang="java"

>Bulkhead bulkhead = Bulkhead.builder()
                            .limit(3)
                            .queueLength(5)
                            .build();
Single&lt;T&gt; single = bulkhead.invoke(this::usesResources);</markup>

<p>This example creates a bulkhead that limits concurrent execution
to <code>this:usesResources</code> to at most 3, and with a queue of size 5. The
bulkhead will report a <code>io.helidon.reactive.faulttolerance.BulkheadException</code> if unable to proceed
with the call: either due to the limit being reached or the queue
being at maximum capacity.</p>

</div>


<h3 id="_handler_composition">Handler Composition</h3>
<div class="section">
<p>Method invocations can be guarded by any combination of the handlers
presented above. For example, an invocation that
times out can be retried a few times before resorting to a fallback value
&mdash;assuming it never succeeds.</p>

<p>The easiest way to achieve handler composition is by using a builder in the
<code>FaultTolerance</code> class as shown in the following example:</p>

<markup
lang="java"

>FaultTolerance.TypedBuilder&lt;T&gt; builder = FaultTolerance.typedBuilder();

// Create and add timeout
Timeout timeout = Timeout.create(Duration.ofMillis(10));
builder.addTimeout(timeout);

// Create and add retry
Retry retry = Retry.builder()
                   .retryPolicy(Retry.JitterRetryPolicy.builder()
                                     .calls(3)
                                     .delay(Duration.ofMillis(100))
                                     .build())
                   .build();
builder.addRetry(retry);

// Create and add fallback
Fallback fallback = Fallback.create(throwable -&gt; Single.just(lastKnownValue));
builder.addFallback(fallback);

// Finally call the method
Single&lt;T&gt; single = builder.build().invoke(this::mayTakeVeryLong);</markup>

<p>The exact order in which handlers are added to a builder depends on the use case,
but generally the order starting from innermost to outermost should be: bulkhead,
timeout, circuit breaker, retry and fallback. That is, fallback is the first
handler in the chain (the last to executed once a value is returned)
and bulkhead is the last one (the first to be executed once a value is returned).</p>

<div class="admonition note">
<p class="admonition-inline">This is the ordering used by the MicroProfile Fault Tolerance implementation
in Helidon when a method is decorated with multiple annotations.</p>
</div>

</div>


<h3 id="_revisiting_multis">Revisiting Multi&#8217;s</h3>
<div class="section">
<p>All the examples presented so far have focused on invocations returning
a single value of type <code>Single&lt;T&gt;</code>. If the invocation in question can return
more than one value (i.e., a <code>Multi&lt;T&gt;</code>) then all that is needed is to use
the method <code>invokeMulti</code> instead of <code>invoke</code>. The supplier passed to this
method must return a <code>Flow.Publisher&lt;T&gt;</code> instead of a <code>CompletionStage&lt;T&gt;</code>.</p>

<p>A <code>Flow.Publisher&lt;T&gt;</code> is a generalization of a <code>Single&lt;T&gt;</code> that can
produce zero or more values. Note that a <code>Flow.Publisher&lt;T&gt;</code>, unlike a
<code>Single&lt;T&gt;</code>, can report an error after
producing one or more values, introducing additional challenges if all
values must be processed transactionally, that is, in an all or nothing
manner.</p>

<p>The following example creates an instance of <code>Retry</code> and invokes
the <code>invokeMulti</code> method, it then registers a subscriber to process
the results:</p>

<markup
lang="java"

>Retry retry = Retry.builder()
                   .retryPolicy(Retry.JitterRetryPolicy.builder()
                                     .calls(2)
                                     .build())
                   .build();
Multi&lt;Integer&gt; multi = retry.invokeMulti(() -&gt; Multi.just(0, 1, 2));

IntSubscriber ts = new IntSubscriber();
multi.subscribe(ts);
ts.request(Integer.MAX_VALUE);</markup>

<p>The call to <code>Multi.just(0, 1, 2)</code> simply returns a multi that produces
the integers 0, 1 and 2. If an error was generated during this process,
the policy will retry the call one more time &mdash;for a total of 2
calls.</p>

</div>

</div>


<h2 id="_configuration">Configuration</h2>
<div class="section">
<p>Each Fault Tolerance handler can be individually configured at build
time. This is supported by calling the <code>config</code> method on the corresponding
builder and specifying a config element. For example, a <code>Timeout</code> handler
can be externally configured as follows:</p>

<markup
lang="java"

>   Timeout timeout = Timeout.builder()
           .config(config.get("timeout"))
           .build();</markup>

<p>and using the following config entry:</p>

<markup
lang="yaml"

>timeout:
  timeout: "PT20S"
  current-thread: true
  name: "MyTimeout"
  cancel-source: false</markup>

<p>Note that the actual timeout value is of type <code>Duration</code>, hence the use
of <code>PT20S</code> that represents a timeout of 20 seconds. See the Javadoc for the <code>Duration</code>
class for more information.</p>

<p>The following tables list all the config elements for each type of
handler supported by this API.</p>


<h3 id="_timeout">Timeout</h3>
<div class="section">

<div class="table__overflow elevation-1  flex sm7
">
<table class="datatable table">
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 60%;">
</colgroup>
<thead>
</thead>
<tbody>
<tr>
<td class="">Property</td>
<td class="">Type</td>
<td class="">Description</td>
</tr>
<tr>
<td class="">name</td>
<td class="">String</td>
<td class="">A name given to the task for debugging purposes. Default is <code>Timeout-N</code>.</td>
</tr>
<tr>
<td class="">timeout</td>
<td class="">Duration</td>
<td class="">The timeout length as a Duration string. Default is <code>PT10S</code> or 10 seconds.</td>
</tr>
<tr>
<td class="">current-thread</td>
<td class="">boolean</td>
<td class="">A flag indicating whether the task should execute in the current thread or not.
Default is <code>false</code>.</td>
</tr>
<tr>
<td class="">cancel-source</td>
<td class="">boolean</td>
<td class="">A flag indicating if this task&#8217;s source should be cancelled if the task is cancelled.
Default is <code>true</code>.</td>
</tr>
</tbody>
</table>
</div>

</div>


<h3 id="_circuit_breaker">Circuit Breaker</h3>
<div class="section">

<div class="table__overflow elevation-1  flex sm7
">
<table class="datatable table">
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 60%;">
</colgroup>
<thead>
</thead>
<tbody>
<tr>
<td class="">Property</td>
<td class="">Type</td>
<td class="">Description</td>
</tr>
<tr>
<td class="">name</td>
<td class="">String</td>
<td class="">A name given to the task for debugging purposes. Default is <code>CircuitBreaker-N</code>.</td>
</tr>
<tr>
<td class="">delay</td>
<td class="">Duration</td>
<td class="">Delay to transition from <em>half-open</em> state. Default is <code>PT5S</code> or 5 seconds.</td>
</tr>
<tr>
<td class="">error-ratio</td>
<td class="">int</td>
<td class="">Failure percentage to transition to <em>open</em> state. Default is 60.</td>
</tr>
<tr>
<td class="">volume</td>
<td class="">int</td>
<td class="">Size of rolling window to calculate ratios. Size is 10.</td>
</tr>
<tr>
<td class="">success-threshold</td>
<td class="">int</td>
<td class="">Number of successful calls to transition to <em>closed</em> state. Default is 1.</td>
</tr>
<tr>
<td class="">cancel-source</td>
<td class="">boolean</td>
<td class="">A flag indicating if this task&#8217;s source should be cancelled if the task is cancelled.
Default is <code>true</code>.</td>
</tr>
</tbody>
</table>
</div>

</div>


<h3 id="_bulkhead">Bulkhead</h3>
<div class="section">

<div class="table__overflow elevation-1  flex sm7
">
<table class="datatable table">
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 60%;">
</colgroup>
<thead>
</thead>
<tbody>
<tr>
<td class="">Property</td>
<td class="">Type</td>
<td class="">Description</td>
</tr>
<tr>
<td class="">limit</td>
<td class="">int</td>
<td class="">Max number of parallel calls. Default is 10.</td>
</tr>
<tr>
<td class="">name</td>
<td class="">String</td>
<td class="">A name given to the task for debugging purposes. Default is <code>Bulkhead-N</code>.</td>
</tr>
<tr>
<td class="">queue-length</td>
<td class="">int</td>
<td class="">Length of queue for tasks waiting to enter. Default is 10.</td>
</tr>
<tr>
<td class="">cancel-source</td>
<td class="">boolean</td>
<td class="">A flag indicating if this task&#8217;s source should be cancelled if the task is cancelled.
Default is <code>true</code>.</td>
</tr>
</tbody>
</table>
</div>

</div>


<h3 id="_retry">Retry</h3>
<div class="section">

<div class="table__overflow elevation-1  flex sm7
">
<table class="datatable table">
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 60%;">
</colgroup>
<thead>
</thead>
<tbody>
<tr>
<td class="">Property</td>
<td class="">Type</td>
<td class="">Description</td>
</tr>
<tr>
<td class="">name</td>
<td class="">String</td>
<td class="">A name given to the task for debugging purposes. Default is <code>Retry-N</code>.</td>
</tr>
<tr>
<td class="">overall-timeout</td>
<td class="">Duration</td>
<td class="">Timeout for overall retry execution. Default is <code>PT1S</code> or 1 second.</td>
</tr>
<tr>
<td class="">delaying-retry-policy</td>
<td class="">Config</td>
<td class="">Config section describing delaying retry policy (see below).</td>
</tr>
<tr>
<td class="">jitter-retry-policy</td>
<td class="">Config</td>
<td class="">Config section describing jitter retry policy (see below)</td>
</tr>
<tr>
<td class="">cancel-source</td>
<td class="">boolean</td>
<td class="">A flag indicating if this task&#8217;s source should be cancelled if the task is cancelled.
Default is <code>true</code>.</td>
</tr>
</tbody>
</table>
</div>


<h4 id="_delaying_retry_policy">Delaying Retry Policy</h4>
<div class="section">

<div class="table__overflow elevation-1  flex sm7
">
<table class="datatable table">
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 60%;">
</colgroup>
<thead>
</thead>
<tbody>
<tr>
<td class="">Property</td>
<td class="">Type</td>
<td class="">Description</td>
</tr>
<tr>
<td class="">calls</td>
<td class="">int</td>
<td class="">Number of retry attempts. Default is 3.</td>
</tr>
<tr>
<td class="">delay</td>
<td class="">Duration</td>
<td class="">Delay between retries. Default is <code>PT0.2S</code> or 200 milliseconds.</td>
</tr>
<tr>
<td class="">delay-factor</td>
<td class="">double</td>
<td class="">A delay multiplication factor applied after each retry.</td>
</tr>
</tbody>
</table>
</div>

</div>


<h4 id="_jitter_retry_policy">Jitter Retry Policy</h4>
<div class="section">

<div class="table__overflow elevation-1  flex sm7
">
<table class="datatable table">
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 60%;">
</colgroup>
<thead>
</thead>
<tbody>
<tr>
<td class="">Property</td>
<td class="">Type</td>
<td class="">Description</td>
</tr>
<tr>
<td class="">calls</td>
<td class="">int</td>
<td class="">Number of retry attempts. Default is 3.</td>
</tr>
<tr>
<td class="">delay</td>
<td class="">Duration</td>
<td class="">Delay between retries. Default is <code>PT0.2S</code> or 200 milliseconds.</td>
</tr>
<tr>
<td class="">jitter</td>
<td class="">Duration</td>
<td class="">A random delay additive factor in the range <code>[-jitter, +jitter]</code>
applied after each retry.</td>
</tr>
</tbody>
</table>
</div>

</div>

</div>

</div>


<h2 id="_examples">Examples</h2>
<div class="section">
<p>See <router-link to="#_api" @click.native="this.scrollFix('#_api')"></router-link> section for examples.</p>

</div>


<h2 id="_additional_information">Additional Information</h2>
<div class="section">
<p>For additional information, see the
<a target="_blank" href="./apidocs/io.helidon.reactive.faulttolerance/module-summary.html">Fault Tolerance SE API Javadocs</a>.</p>

</div>

</doc-view>
