<doc-view>

<h2 id="_contents">Contents</h2>
<div class="section">
<ul class="ulist">
<li>
<p><router-link to="#_overview" @click.native="this.scrollFix('#_overview')">Overview</router-link></p>

</li>
<li>
<p><router-link to="#maven-coordinates" @click.native="this.scrollFix('#maven-coordinates')">Maven Coordinates</router-link></p>

</li>
<li>
<p><router-link to="#_api" @click.native="this.scrollFix('#_api')">API</router-link></p>

</li>
<li>
<p><router-link to="#_examples" @click.native="this.scrollFix('#_examples')">Examples</router-link></p>

</li>
<li>
<p><router-link to="#_additional_information" @click.native="this.scrollFix('#_additional_information')">Additional Information</router-link></p>

</li>
</ul>

</div>


<h2 id="_overview">Overview</h2>
<div class="section">
<p>Helidon Fault Tolerance support is inspired by
<a target="_blank" href="https://download.eclipse.org/microprofile/microprofile-fault-tolerance-4.0.2/microprofile-fault-tolerance-spec-4.0.2.html">MicroProfile Fault Tolerance</a>.
The API defines the notion of a <em>fault handler</em> that can be combined with other handlers to
improve application robustness. Handlers are created to manage error conditions (faults)
that may occur in real-world application environments. Examples include service restarts,
network delays, temporal infrastructure instabilities, etc.</p>

<p>The interaction of multiple microservices bring some new challenges from distributed systems
that require careful planning. Faults in distributed systems should be compartmentalized
to avoid unnecessary service interruptions. For example, if comparable information can
be obtained from multiples sources, a user request <em>should not</em> be denied when a subset
of these sources is unreachable or offline. Similarly, if a non-essential source has been
flagged as unreachable, an application should avoid continuous access to that source
as that would result in much higher response times.</p>

<p>In order to tackle the most common types of application faults, the Helidon Fault
Tolerance API provides support for circuit breakers, retries, timeouts, bulkheads and fallbacks.
In addition, the API makes it very easy to create and monitor asynchronous tasks that
do not require explicit creation and management of threads or executors.</p>

<p>For more information, see
<a target="_blank" href="/apidocs/io.helidon.faulttolerance/module-summary.html">Fault Tolerance API Javadocs</a>.</p>

</div>


<h2 id="maven-coordinates">Maven Coordinates</h2>
<div class="section">
<p>To enable Fault Tolerance
add the following dependency to your project&#8217;s <code>pom.xml</code> (see
 <router-link to="/about/managing-dependencies">Managing Dependencies</router-link>).</p>

<markup
lang="xml"

>&lt;dependency&gt;
    &lt;groupId&gt;io.helidon.fault-tolerance&lt;/groupId&gt;
    &lt;artifactId&gt;helidon-fault-tolerance&lt;/artifactId&gt;
&lt;/dependency&gt;</markup>

</div>


<h2 id="_api">API</h2>
<div class="section">
<p>The Fault Tolerance API is <em>blocking</em> and based on the JDK&#8217;s virtual thread model.
As a result, methods return <em>direct</em> values instead of promises in the form of
<code>Single&lt;T&gt;</code> or <code>Multi&lt;T&gt;</code>.</p>

<p>In the sections that follow, we shall briefly explore each of the constructs provided
by this API.</p>


<h3 id="_retries">Retries</h3>
<div class="section">
<p>Temporal networking problems can sometimes be mitigated by simply retrying
a certain task. A <code>Retry</code> handler is created using a <code>RetryPolicy</code> that
indicates the number of retries, delay between retries, etc.</p>

<markup
lang="java"

>Retry retry = Retry.builder()
        .retryPolicy(Retry.JitterRetryPolicy.builder()
                             .calls(3)
                             .delay(Duration.ofMillis(100))
                             .build())
        .build();
T result = retry.invoke(this::retryOnFailure);</markup>

<p>The sample code above will retry calls to the supplier <code>this::retryOnFailure</code>
for up to 3 times with a 100-millisecond delay between them.</p>

<div class="admonition note">
<p class="admonition-inline">The return type of method <code>retryOnFailure</code> in the example above must
be some <code>T</code> and the parameter to the retry handler&#8217;s <code>invoke</code>
method <code>Supplier&lt;? extends T&gt;</code>.</p>
</div>

<p>If the call to the supplier provided completes exceptionally, it will be treated as
a failure and retried until the maximum number of attempts is reached; finer control
is possible by creating a retry policy and using methods such as
<code>applyOn(Class&lt;? extends Throwable&gt;&#8230;&#8203; classes)</code> and
<code>skipOn(Class&lt;? extends Throwable&gt;&#8230;&#8203; classes)</code> to control the exceptions
that must be retried and those that must be ignored.</p>

</div>


<h3 id="_timeouts">Timeouts</h3>
<div class="section">
<p>A request to a service that is inaccessible or simply unavailable should be bounded
to ensure a certain quality of service and response time. Timeouts can be configured
to avoid excessive waiting times. In addition, a fallback action can be defined
if a timeout expires as we shall cover in the next section.</p>

<p>The following is an example of using <code>Timeout</code>:</p>

<markup
lang="java"

>T result = Timeout.create(Duration.ofMillis(10))
        .invoke(this::mayTakeVeryLong);</markup>

<div class="admonition note">
<p class="admonition-inline">Using a handler&#8217;s <code>create</code> method is an alternative to using a builder that is
more convenient when default settings are acceptable.</p>
</div>

<p>The example above monitors the call to method <code>mayTakeVeryLong</code> and reports a
<code>TimeoutException</code> if the execution takes more than 10 milliseconds to complete.</p>

</div>


<h3 id="_fallbacks">Fallbacks</h3>
<div class="section">
<p>A fallback to a <em>known</em> result can sometimes be an alternative to
reporting an error. For example, if we are unable to access a service
we may fall back to the last result obtained from that service at an
earlier time.</p>

<p>A <code>Fallback</code> instance is created by providing a function that takes a <code>Throwable</code>
and produces some <code>T</code> to be used when the intended method failed to return a value:</p>

<markup
lang="java"

>T result = Fallback.createFromMethod(throwable -&gt; lastKnownValue)
        .invoke(this::mayFail);</markup>

<p>This example calls the method <code>mayFail</code> and if it produces a <code>Throwable</code>, it maps
it to the last known value using the fallback handler.</p>

</div>


<h3 id="_circuit_breakers">Circuit Breakers</h3>
<div class="section">
<p>Failing to execute a certain task or to call another service repeatedly can have a direct
impact on application performance. It is often preferred to avoid calls to non-essential
services by simply preventing that logic to execute altogether. A circuit breaker can be
configured to monitor such calls and block attempts that are likely to fail, thus improving
overall performance.</p>

<p>Circuit breakers start in a <em>closed</em> state, letting calls to proceed normally; after
detecting a certain number of errors during a pre-defined processing window, they can <em>open</em> to
prevent additional failures. After a circuit has been opened, it can transition
first to a <em>half-open</em> state before finally transitioning back to a closed state.
The use of an intermediate state (half-open)
makes transitions from open to close more progressive, and prevents a circuit breaker
from eagerly transitioning to states without considering sufficient observations.</p>

<div class="admonition note">
<p class="admonition-inline">Any failure while a circuit breaker is in half-open state will immediately
cause it to transition back to an open state.</p>
</div>

<p>Consider the following example in which <code>this::mayFail</code> is monitored by a
circuit breaker:</p>

<markup
lang="java"

>CircuitBreaker breaker = CircuitBreaker.builder()
        .volume(10)
        .errorRatio(30)
        .delay(Duration.ofMillis(200))
        .successThreshold(2)
        .build();
T result = breaker.invoke(this::mayFail);</markup>

<p>The circuit breaker in this example defines a processing window of size 10, an error
ratio of 30%, a duration to transition to half-open state of 200 milliseconds, and
a success threshold to transition from half-open to closed state of 2 observations.
It follows that,</p>

<ul class="ulist">
<li>
<p>After completing the processing window, if at least 3 errors are detected, the
circuit breaker will transition to the open state, thus blocking the execution
of any subsequent calls.</p>

</li>
<li>
<p>After 200 millis, the circuit breaker will transition back to half-open and
allow calls to proceed again.</p>

</li>
<li>
<p>If the next two calls after transitioning to half-open are successful, the
circuit breaker will transition to closed state; otherwise, it will
transition back to open state, waiting for another 200 milliseconds
before attempting to transition to half-open again.</p>

</li>
</ul>

<p>A circuit breaker will throw a
<code>io.helidon.faulttolerance.CircuitBreakerOpenException</code>
if an attempt to make an invocation takes place while it is in open state.</p>

</div>


<h3 id="_bulkheads">Bulkheads</h3>
<div class="section">
<p>Concurrent access to certain components may need to be limited to avoid
excessive use of resources. For example, if an invocation that opens
a network connection is allowed to execute concurrently without
any restriction, and if the service on the other end is slow responding,
it is possible for the rate at which network connections are opened
to exceed the maximum number of connections allowed. Faults of this
type can be prevented by guarding these invocations using a bulkhead.</p>

<div class="admonition note">
<p class="admonition-inline">The origin of the name <em>bulkhead</em> comes from the partitions that
comprise a ship&#8217;s hull. If some partition is somehow compromised
(e.g., filled with water) it can be isolated in a manner not to
affect the rest of the hull.</p>
</div>

<p>A waiting queue can be associated with a bulkhead to handle tasks
that are submitted when the bulkhead is already at full capacity.</p>

<markup
lang="java"

>Bulkhead bulkhead = Bulkhead.builder()
        .limit(3)
        .queueLength(5)
        .build();
T result = bulkhead.invoke(this::usesResources);</markup>

<p>This example creates a bulkhead that limits concurrent execution
to <code>this:usesResources</code> to at most 3, and with a queue of size 5. The
bulkhead will report a <code>io.helidon.faulttolerance.BulkheadException</code> if unable
to proceed with the call: either due to the limit being reached or the queue
being at maximum capacity.</p>

</div>


<h3 id="_asynchronous">Asynchronous</h3>
<div class="section">
<p>Asynchronous tasks can be created or forked by using an <code>Async</code> instance. A supplier of type
<code>T</code> is provided as the argument when invoking this handler. For example:</p>

<markup
lang="java"

>CompletableFuture&lt;Thread&gt; cf = Async.create().invoke(Thread::currentThread);
cf.thenAccept(t -&gt; System.out.println("Async task executed in thread " + t));</markup>

<p>The supplier <code>() &#8594; Thread.currentThread()</code> is executed in a new thread and
the value it produces printed by the consumer and passed to <code>thenAccept</code>.</p>

<p>By default, asynchronous tasks are executed using a <em>new virtual thread per
task</em> based on the <code>ExecutorService</code> defined in
<code>io.helidon.faulttolerance.FaultTolerance</code> and
configurable by an application. Alternatively, an <code>ExecutorService</code> can be specified
when building a non-standard <code>Async</code> instance.</p>

</div>


<h3 id="_handler_composition">Handler Composition</h3>
<div class="section">
<p>Method invocations can be guarded by any combination of the handlers
presented above. For example, an invocation that
times out can be retried a few times before resorting to a fallback value
&mdash;assuming it never succeeds.</p>

<p>The easiest way to achieve handler composition is by using a builder in the
<code>FaultTolerance</code> class as shown in the following example:</p>

<markup
lang="java"

>FaultTolerance.TypedBuilder&lt;T&gt; builder = FaultTolerance.typedBuilder();

Timeout timeout = Timeout.create(Duration.ofMillis(10));
builder.addTimeout(timeout);

Retry retry = Retry.builder()
        .retryPolicy(Retry.JitterRetryPolicy.builder()
                             .calls(3)
                             .delay(Duration.ofMillis(100))
                             .build())
        .build();
builder.addRetry(retry);

Fallback&lt;T&gt; fallback = Fallback.createFromMethod(throwable -&gt; lastKnownValue);
builder.addFallback(fallback);

T result = builder.build().invoke(this::mayTakeVeryLong);</markup>

<p>The exact order in which handlers are added to a builder depends on the use case,
but generally the order starting from innermost to outermost should be: bulkhead,
timeout, circuit breaker, retry and fallback. That is, fallback is the first
handler in the chain (the last to executed once a value is returned)
and bulkhead is the last one (the first to be executed once a value is returned).</p>

<div class="admonition note">
<p class="admonition-inline">This is the ordering used by the MicroProfile Fault Tolerance implementation
in Helidon when a method is decorated with multiple annotations.</p>
</div>

</div>

</div>


<h2 id="_examples">Examples</h2>
<div class="section">
<p>See <router-link to="#_api" @click.native="this.scrollFix('#_api')"></router-link> section for examples.</p>

</div>


<h2 id="_additional_information">Additional Information</h2>
<div class="section">
<p>For additional information, see the
<a target="_blank" href="/apidocs/io.helidon.faulttolerance/module-summary.html">Fault Tolerance API Javadocs</a>.</p>

</div>

</doc-view>
