IBM watsonx.ai
You can develop generative AI solutions with foundation models in IBM watsonx.ai. You can use prompts to generate, classify, summarize, or extract content from your input text. Choose from IBM models or open source models from Hugging Face. You can tune foundation models to customize your prompt output or optimize inferencing performance.
| Supported only for IBM watsonx as a service on IBM Cloud. |
Using watsonx.ai
To employ watsonx.ai LLMs, integrate the following dependency into your project:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-watsonx</artifactId>
<version>0.11.1</version>
</dependency>
If no other extension is installed, AI Services will automatically utilize the configured watsonx dependency.
Configuration
To use the watsonx.ai dependency, you must configure some required values in the application.properties file.
Base URL
The base-url property depends on the region of the provided service instance, use one of the following values:
-
Frankfurt: https://eu-de.ml.cloud.ibm.com
quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
Project ID
To prompt foundation models in watsonx.ai programmatically, you need to pass the identifier (ID) of a project.
To get the ID of a project, complete the following steps:
-
Open the project, and then click the Manage tab.
-
Copy the project ID from the Details section of the General page.
quarkus.langchain4j.watsonx.project-id=23d...
API Key
To prompt foundation models in IBM watsonx.ai programmatically, you need an IBM Cloud API key.
quarkus.langchain4j.watsonx.api-key=hG-...
| To determine the API key, go to https://cloud.ibm.com/iam/apikeys and generate it. |
All configuration properties
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Type |
Default |
|
|---|---|---|
Whether the model should be enabled Environment variable: |
boolean |
|
Base URL Environment variable: |
||
IBM Cloud API key Environment variable: |
string |
|
Timeout for watsonx.ai API calls Environment variable: |
|
|
Version to use Environment variable: |
string |
|
Watsonx.ai project id. Environment variable: |
string |
|
Whether the watsonx.ai client should log requests Environment variable: |
boolean |
|
Whether the watsonx.ai client should log responses Environment variable: |
boolean |
|
Whether or not to enable the integration. Defaults to Environment variable: |
boolean |
|
IAM base URL Environment variable: |
||
Timeout for IAM API calls Environment variable: |
|
|
IAM grant type Environment variable: |
string |
|
Model to use Environment variable: |
string |
|
Represents the strategy used for picking the tokens during generation of the output text. Options are greedy and sample. Value defaults to sample if not specified. During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters described below. See this url for an informative article about text generation. Environment variable: |
string |
|
A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect and is the default. The allowed range is 0.0 to 2.0. Environment variable: |
double |
|
If stop sequences are given, they are ignored until minimum tokens are generated. Defaults to 0. Environment variable: |
int |
|
The maximum number of new tokens to be generated. The range is 0 to 1024. Environment variable: |
int |
|
Random number generator seed to use in sampling mode for experimental repeatability. Must be >= 1. Environment variable: |
int |
|
Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. The list may contain up to 6 strings. Environment variable: |
list of string |
|
The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode, with range from 1 to 100. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token. Environment variable: |
int |
|
Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. The valid range is 0.0 to 1.0 where 1.0 is equivalent to disabled and is the default. Also known as nucleus sampling. Environment variable: |
double |
|
Represents the penalty for penalizing tokens that have already been generated or belong to the context. The range is 1.0 to 2.0 and defaults to 1.0 (no penalty). Environment variable: |
double |
|
Type |
Default |
|
Base URL Environment variable: |
||
IBM Cloud API key Environment variable: |
string |
|
Timeout for watsonx.ai API calls Environment variable: |
|
|
Version to use Environment variable: |
string |
|
Watsonx.ai project id. Environment variable: |
string |
|
Whether the watsonx.ai client should log requests Environment variable: |
boolean |
|
Whether the watsonx.ai client should log responses Environment variable: |
boolean |
|
Whether or not to enable the integration. Defaults to Environment variable: |
boolean |
|
IAM base URL Environment variable: |
||
Timeout for IAM API calls Environment variable: |
|
|
IAM grant type Environment variable: |
string |
|
Model to use Environment variable: |
string |
|
Represents the strategy used for picking the tokens during generation of the output text. Options are greedy and sample. Value defaults to sample if not specified. During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters described below. See this url for an informative article about text generation. Environment variable: |
string |
|
A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect and is the default. The allowed range is 0.0 to 2.0. Environment variable: |
double |
|
If stop sequences are given, they are ignored until minimum tokens are generated. Defaults to 0. Environment variable: |
int |
|
The maximum number of new tokens to be generated. The range is 0 to 1024. Environment variable: |
int |
|
Random number generator seed to use in sampling mode for experimental repeatability. Must be >= 1. Environment variable: |
int |
|
Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. The list may contain up to 6 strings. Environment variable: |
list of string |
|
The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode, with range from 1 to 100. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token. Environment variable: |
int |
|
Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. The valid range is 0.0 to 1.0 where 1.0 is equivalent to disabled and is the default. Also known as nucleus sampling. Environment variable: |
double |
|
Represents the penalty for penalizing tokens that have already been generated or belong to the context. The range is 1.0 to 2.0 and defaults to 1.0 (no penalty). Environment variable: |
double |
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|
Example
An example usage is the following:
quarkus.langchain4j.watsonx.api-key=hG-...
quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
quarkus.langchain4j.watsonx.chat-model.model-id=ibm/granite-13b-chat-v2
public record Result(Integer result) {}
@RegisterAiService
public interface LLMService {
@SystemMessage("You are a calculator")
@UserMessage("""
You must perform the mathematical operation delimited by ---
---
{firstNumber} + {secondNumber}
---
""")
public Result calculator(int firstNumber, int secondNumber);
}
@Path("/llm")
public class LLMResource {
@Inject
LLMService llmService;
@GET
@Path("/calculator")
public Result calculator() {
return llmService.calculator(2, 2);
}
}
❯ curl http://localhost:8080/llm/calculator
{"result":4}
Sometimes it may be useful to use the quarkus.langchain4j.watsonx.chat-model.stop-sequences property to prevent the LLM model from returning more results than desired.
|