Ollama
Prerequisites
To use Ollama, you need to have a running Ollama server. Go to the Ollama download page and download the server for your platform.
Once installed, check that Ollama is running using:
> ollama list
It may not display any model, which is fine, let’s pull the llama2 model:
> ollama pull llama2
| Models are huge. For example Llama2 is 3.8Gb. Make sure you have enough disk space. |
Let’s also pull the default embedding model:
> ollama pull nomic-embed-text
|
Dev Service
If you have Ollama running locally, you do not need a dev service. However, if you want to use the Ollama dev service, add the following dependency to your project:
Then, in your
The dev service will start an Ollama server for you, using a docker container. Note that the provisioning can take some time. |
Using Ollama
To integrate with models running on Ollama, add the following dependency into your project:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
<version>0.11.1</version>
</dependency>
If no other LLM extension is installed, AI Services will automatically utilize the configured Ollama model.
By default, the extension uses llama2, the model we pulled in the previous section.
You can change it by setting the quarkus.langchain4j.ollama.chat-model.model-id property in the application.properties file:
# Do not forget to pull the model before using it using `ollama pull <model-id>`
quarkus.langchain4j.ollama.chat-model.model-id=mistral
Configuration
Several configuration properties are available:
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Type |
Default |
|
|---|---|---|
Whether the model should be enabled Environment variable: |
boolean |
|
Whether the model should be enabled Environment variable: |
boolean |
|
Base URL where the Ollama serving is running Environment variable: |
string |
|
Timeout for Ollama calls Environment variable: |
|
|
Whether the Ollama client should log requests Environment variable: |
boolean |
|
Whether the Ollama client should log responses Environment variable: |
boolean |
|
Whether or not to enable the integration. Defaults to Environment variable: |
boolean |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
With a static number the result is always the same. With a random number the result varies Example: Random random = new Random(); int x = random.nextInt(Integer.MAX_VALUE); Environment variable: |
int |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
Type |
Default |
|
Base URL where the Ollama serving is running Environment variable: |
string |
|
Timeout for Ollama calls Environment variable: |
|
|
Whether the Ollama client should log requests Environment variable: |
boolean |
|
Whether the Ollama client should log responses Environment variable: |
boolean |
|
Whether or not to enable the integration. Defaults to Environment variable: |
boolean |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
With a static number the result is always the same. With a random number the result varies Example: Random random = new Random(); int x = random.nextInt(Integer.MAX_VALUE); Environment variable: |
int |
|
Model to use. According to Ollama docs, the default value is Environment variable: |
string |
|
The temperature of the model. Increasing the temperature will make the model answer with more variability. A lower temperature will make the model answer more conservatively. Environment variable: |
double |
|
Maximum number of tokens to predict when generating text Environment variable: |
int |
|
Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return Environment variable: |
string |
|
Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text Environment variable: |
double |
|
Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative Environment variable: |
int |
|
|
About the Duration format
To write duration values, use the standard You can also use a simplified format, starting with a number:
In other cases, the simplified format is translated to the
|
Document Retriever and Embedding
Ollama also provides embedding models.
By default, it uses nomic-embed-text (make sure you pulled that model as indicated in the prerequisites section).
You can change the default embedding model by setting the quarkus.langchain4j.ollama.embedding-model.model-id property in the application.properties file:
quarkus.langchain4j.ollama.log-requests=true
quarkus.langchain4j.ollama.log-responses=true
quarkus.langchain4j.ollama.chat-model.model-id=mistral
quarkus.langchain4j.ollama.embedding-model.model-id=mistral
If no other LLM extension is installed, retrieve the embedding model as follows:
@Inject EmbeddingModel model; // Injects the embedding model
However, in general, we recommend using local embedding models, as Ollama embeddings are rather slow.