IBM watsonx.ai

You can develop generative AI solutions with foundation models in IBM watsonx.ai. You can use prompts to generate, classify, summarize, or extract content from your input text. Choose from IBM models or open source models from Hugging Face. You can tune foundation models to customize your prompt output or optimize inferencing performance.

Supported only for IBM watsonx as a service on IBM Cloud.

Using watsonx.ai

To employ watsonx.ai LLMs, integrate the following dependency into your project:

<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-watsonx</artifactId>
    <version>0.11.1</version>
</dependency>

If no other extension is installed, AI Services will automatically utilize the configured watsonx dependency.

Configuration

To use the watsonx.ai dependency, you must configure some required values in the application.properties file.

Base URL

The base-url property depends on the region of the provided service instance, use one of the following values:

quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com

Project ID

To prompt foundation models in watsonx.ai programmatically, you need to pass the identifier (ID) of a project.

To get the ID of a project, complete the following steps:

  1. Open the project, and then click the Manage tab.

  2. Copy the project ID from the Details section of the General page.

quarkus.langchain4j.watsonx.project-id=23d...

API Key

To prompt foundation models in IBM watsonx.ai programmatically, you need an IBM Cloud API key.

quarkus.langchain4j.watsonx.api-key=hG-...
To determine the API key, go to https://cloud.ibm.com/iam/apikeys and generate it.

All configuration properties

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Whether the model should be enabled

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_ENABLED

boolean

true

Base URL

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_BASE_URL

URL

https://dummy.ai/api

IBM Cloud API key

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_API_KEY

string

dummy

Timeout for watsonx.ai API calls

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_TIMEOUT

Duration

10S

Version to use

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_VERSION

string

2023-05-29

Watsonx.ai project id.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_PROJECT_ID

string

dummy

Whether the watsonx.ai client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_LOG_REQUESTS

boolean

false

Whether the watsonx.ai client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_LOG_RESPONSES

boolean

false

Whether or not to enable the integration. Defaults to true, which means requests are made to the watsonx.ai provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_ENABLE_INTEGRATION

boolean

true

IAM base URL

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_BASE_URL

URL

https://iam.cloud.ibm.com

Timeout for IAM API calls

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_TIMEOUT

Duration

10S

IAM grant type

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_IAM_GRANT_TYPE

string

urn:ibm:params:oauth:grant-type:apikey

Model to use

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MODEL_ID

string

ibm/granite-13b-chat-v2

Represents the strategy used for picking the tokens during generation of the output text. Options are greedy and sample. Value defaults to sample if not specified.

During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters described below. See this url for an informative article about text generation.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_DECODING_METHOD

string

greedy

A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect and is the default. The allowed range is 0.0 to 2.0.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TEMPERATURE

double

1.0

If stop sequences are given, they are ignored until minimum tokens are generated. Defaults to 0.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MIN_NEW_TOKENS

int

0

The maximum number of new tokens to be generated. The range is 0 to 1024.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_MAX_NEW_TOKENS

int

200

Random number generator seed to use in sampling mode for experimental repeatability. Must be >= 1.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_RANDOM_SEED

int

Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. The list may contain up to 6 strings.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_STOP_SEQUENCES

list of string

The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode, with range from 1 to 100. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOP_K

int

Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. The valid range is 0.0 to 1.0 where 1.0 is equivalent to disabled and is the default. Also known as nucleus sampling.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_TOP_P

double

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The range is 1.0 to 2.0 and defaults to 1.0 (no penalty).

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX_CHAT_MODEL_REPETITION_PENALTY

double

Named model config

Type

Default

Base URL

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__BASE_URL

URL

https://dummy.ai/api

IBM Cloud API key

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__API_KEY

string

dummy

Timeout for watsonx.ai API calls

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__TIMEOUT

Duration

10S

Version to use

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__VERSION

string

2023-05-29

Watsonx.ai project id.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__PROJECT_ID

string

dummy

Whether the watsonx.ai client should log requests

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__LOG_REQUESTS

boolean

false

Whether the watsonx.ai client should log responses

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__LOG_RESPONSES

boolean

false

Whether or not to enable the integration. Defaults to true, which means requests are made to the watsonx.ai provider. Set to false to disable all requests.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__ENABLE_INTEGRATION

boolean

true

IAM base URL

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_BASE_URL

URL

https://iam.cloud.ibm.com

Timeout for IAM API calls

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_TIMEOUT

Duration

10S

IAM grant type

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__IAM_GRANT_TYPE

string

urn:ibm:params:oauth:grant-type:apikey

Model to use

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MODEL_ID

string

ibm/granite-13b-chat-v2

Represents the strategy used for picking the tokens during generation of the output text. Options are greedy and sample. Value defaults to sample if not specified.

During text generation when parameter value is set to greedy, each successive token corresponds to the highest probability token given the text that has already been generated. This strategy can lead to repetitive results especially for longer output sequences. The alternative sample strategy generates text by picking subsequent tokens based on the probability distribution of possible next tokens defined by (i.e., conditioned on) the already-generated text and the top_k and top_p parameters described below. See this url for an informative article about text generation.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_DECODING_METHOD

string

greedy

A value used to modify the next-token probabilities in sampling mode. Values less than 1.0 sharpen the probability distribution, resulting in "less random" output. Values greater than 1.0 flatten the probability distribution, resulting in "more random" output. A value of 1.0 has no effect and is the default. The allowed range is 0.0 to 2.0.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TEMPERATURE

double

1.0

If stop sequences are given, they are ignored until minimum tokens are generated. Defaults to 0.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MIN_NEW_TOKENS

int

0

The maximum number of new tokens to be generated. The range is 0 to 1024.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_MAX_NEW_TOKENS

int

200

Random number generator seed to use in sampling mode for experimental repeatability. Must be >= 1.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_RANDOM_SEED

int

Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output. Stop sequences encountered prior to the minimum number of tokens being generated will be ignored. The list may contain up to 6 strings.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_STOP_SEQUENCES

list of string

The number of highest probability vocabulary tokens to keep for top-k-filtering. Only applies for sampling mode, with range from 1 to 100. When decoding_strategy is set to sample, only the top_k most likely tokens are considered as candidates for the next generated token.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOP_K

int

Similar to top_k except the candidates to generate the next token are the most likely tokens with probabilities that add up to at least top_p. The valid range is 0.0 to 1.0 where 1.0 is equivalent to disabled and is the default. Also known as nucleus sampling.

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_TOP_P

double

Represents the penalty for penalizing tokens that have already been generated or belong to the context. The range is 1.0 to 2.0 and defaults to 1.0 (no penalty).

Environment variable: QUARKUS_LANGCHAIN4J_WATSONX__MODEL_NAME__CHAT_MODEL_REPETITION_PENALTY

double

About the Duration format

To write duration values, use the standard java.time.Duration format. See the Duration#parse() Java API documentation for more information.

You can also use a simplified format, starting with a number:

  • If the value is only a number, it represents time in seconds.

  • If the value is a number followed by ms, it represents time in milliseconds.

In other cases, the simplified format is translated to the java.time.Duration format for parsing:

  • If the value is a number followed by h, m, or s, it is prefixed with PT.

  • If the value is a number followed by d, it is prefixed with P.

Example

An example usage is the following:

quarkus.langchain4j.watsonx.api-key=hG-...
quarkus.langchain4j.watsonx.base-url=https://us-south.ml.cloud.ibm.com
quarkus.langchain4j.watsonx.chat-model.model-id=ibm/granite-13b-chat-v2
public record Result(Integer result) {}
@RegisterAiService
public interface LLMService {

    @SystemMessage("You are a calculator")
    @UserMessage("""
        You must perform the mathematical operation delimited by ---
        ---
        {firstNumber} + {secondNumber}
        ---
    """)
    public Result calculator(int firstNumber, int secondNumber);
}
@Path("/llm")
public class LLMResource {

    @Inject
    LLMService llmService;

    @GET
    @Path("/calculator")
    public Result calculator() {
        return llmService.calculator(2, 2);
    }
}
❯ curl http://localhost:8080/llm/calculator
{"result":4}
Sometimes it may be useful to use the quarkus.langchain4j.watsonx.chat-model.stop-sequences property to prevent the LLM model from returning more results than desired.