LLM Models API (BETA)
| APIs needed for the LLM Application |
| Type | JAR |
| Category | API |
| Developed by | Matéo Munoz, Ludovic Dubost, Michael Hamann, Paul Pantiru |
| Active Installs | 0 |
| Rating | |
| License | GNU Lesser General Public License 2.1 |
| Compatibility | XWiki 16.2.0 since version 0.3, XWiki 14.10 for versions before 0.3 |
Table of contents
Description
This extension provides various APIs that are used by the LLM Application. It also implements a REST API that provides prompts configured in the LLM Application, lists the configured models, and offers a chat completion API that implements a subset of the OpenAI chat completion API.
On the Java side, this API also provides ways to interact with LLMs directly.
REST API
The REST API that is documented here exists since version 0.3 of the extension, before that a different, undocumented API was provided.
Prompts
GET /wikis/{wikiName}/aiLLM/v1/prompts
Lists all configured prompts. Each prompt has the following properties:
- name: the user-visible name of the prompt
- prompt: the system prompt to use
- user_prompt: the prompt to use
- description: the description of the prompt
- is_active: if the prompt is enabled
- default: if the prompt is the default, only one prompt should have this set to true
- temperature: the temperature to use for the chat completion
page_name: the name of the XWiki page that defines the prompt
Models
GET /wikis/{wikiName}/aiLLM/v1/models
A list of all models that are available for the current user. This API follows the OpenAI models endpoint but provides additional metadata for each model. For every model, the following properties are provided:
- id: The reference of the page in the wiki where the model is defined
- name: The "nice" name that should be displayed for users
- context_length: the supported context length
- can_stream: if the model supports streaming requests
An example response with a single model could look like this:
{
"data": [
{
"id": "AI.Models.Llama 3 8B",
"name": "Llama 3 8B",
"context_length": 8000,
"can_stream": true
}
],
"object": "list",
"first_id": null,
"last_id": null,
"has_more": false
}Chat Completions
POST /wikis/{wikiName}/aiLLM/v1/chat/completions
This API follows the specification of the OpenAI chat completion API without support for tools or function calls. For streaming requests, returning usage information is always enabled and will be forwarded if the used LLM provider returns it.
LLM Application 0.3+
Chat Request Filters
The LLM Application supports filters to modify both the incoming chat completion request and the response. The filter system that is used for the context is designed to be extensible by extension. The filter system is similar to Wiki Components, though there is only a single real component for every model and filters are stored inside that component. A filter consists of at minimum three parts:
- An XClass to store its configuration
- A sheet to display and edit that configuration
- A component implementing org.xwiki.contrib.llm.ChatRequestFilterBuilder that provides references to the XClass and the sheet and has a method to construct a list of no, one, or several org.xwiki.contrib.llm.ChatRequestFilter objects from a BaseObject (of the XClass of the configuration)
All available filters are always enabled, it is up to the filter to ensure that it has a sensible default state and to provide a way to disable it if that makes sense. When the filter is disabled, the ChatRequestFilterBuilder can return an empty list to not insert any filter into the filter chain.
Filters use a chain of responsibility design pattern to forward a chat completion request between a chain of filters until it reaches the component that is responsible for making the actual request to the LLM. This provides filters a lot of flexibility, filters can:
- Modify the incoming chat completion request, e.g., adding context the context filter does.
- Decide to stop propagating the request, e.g., based on a rate limit or a content filter, and reply with an error message instead. This could also be used to implement a caching system or to log all requests.
- Intercept the response and modify it, e.g., to add additional information or to filter results based on some criteria or to log responses.
Both request and response are available at the same time, allowing to easily pass information between them or also, e.g., log both at the same time.
The priority of the ChatRequestFilterBuilder component decides the order of the filters. Filters with a lower priority are called first. To simplify the implementation of filters, an abstract class org.xwiki.contrib.llm.AbstractChatRequestFilter is provided that implements the forwarding to the next filter. There are two possible request modes that are supported by the LLM Application: streaming and non-streaming. For streaming request, the LLM provides a stream of tokens that are forwarded to the client one by one. In the LLM Application, this is implemented with a callback that is called for every chat completion chunk. If any modification of the result shall be performed, the callback that is forwarded can be replaced by a custom callback. It could also be imagined to cache a number of tokens and only then forward them as whole paragraph if the paragraph passes the content filtering guidelines. It is also possible to add a prefix to the output by calling the callback before forwarding the call in the filter chain. For non-streaming requests, the full result is returned by the filter chain and can thus be examined modified as a whole. In both cases, the request can be modified before forwarding it and in both cases forwarding the request by calling the parent method is optional. The callback for streaming requests may throw an IOException, this happens in particular when the connection is closed by the client because the user cancelled the chat completion request. Both streaming and non-streaming requests are fully synchronous at the moment. This might be changed in particular for streaming requests in case it should turn out to be a problem in the future.
public class MyChatRequestFilter extends AbstractChatRequestFilter
{
@Override
public void processStreaming(ChatCompletionRequest request,
FailableConsumer<ChatCompletionChunk, IOException> consumer) throws IOException, RequestError
{
// Modify the request or replace/wrap the consumer if desired.
super.processStreaming(request, consumer);
}
@Override
public ChatCompletionResult process(ChatCompletionRequest request) throws IOException, RequestError
{
// Modify the request or the result if desired.
return super.process(modifiedRequest);
}
}For the sheet, when using editable properties, it is important to set data-object-policy="updateOrCreate" as the XObject won't be present on models that were created before the extension has been installed. The sheet for models takes care of adding the XObject in regular view and edit mode, so in these cases the presence of the XObject can be assumed.
Prerequisites & Installation Instructions
We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager).
You can also use the manual method which involves dropping the JAR file and all its dependencies into the WEB-INF/lib folder and restarting XWiki.
Versions
Dependencies
Dependencies for this extension (org.xwiki.contrib.llm:application-ai-llm-models-api 0.8):
- org.xwiki.platform:xwiki-platform-rest-api 17.4.4
- org.xwiki.platform:xwiki-platform-rest-server 17.4.4
- org.xwiki.platform:xwiki-platform-search-solr-query 17.4.4
- org.xwiki.platform:xwiki-platform-user-api 17.4.4
- org.xwiki.platform:xwiki-platform-component-wiki 17.4.4
- com.theokanning.openai-gpt3-java:api 0.18.2