LLM Models API (BETA)

Last modified by Admin on 2026/03/19 00:16

Manage
- Copy
Actions
Viewers
- Source
- Children
- Attachments
- History
- Information
- Likes

APIs needed for the LLM Application

Type	JAR
Category	API
Developed by	Matéo Munoz, Ludovic Dubost, Michael Hamann, Paul Pantiru
Active Installs	0
Rating	0.0 1 2 3 4 5 0 Votes
License	GNU Lesser General Public License 2.1

Compatibility

XWiki 16.2.0 since version 0.3, XWiki 14.10 for versions before 0.3

Installable with the Extension Manager

Sources Issues Download v0.8

Table of contents

Description
REST API
Chat Request Filters
Prerequisites & Installation Instructions
Versions
Dependencies

Description

This extension provides various APIs that are used by the LLM Application. It also implements a REST API that provides prompts configured in the LLM Application, lists the configured models, and offers a chat completion API that implements a subset of the OpenAI chat completion API.

On the Java side, this API also provides ways to interact with LLMs directly.

REST API

The REST API that is documented here exists since version 0.3 of the extension, before that a different, undocumented API was provided.

Prompts

GET /wikis/{wikiName}/aiLLM/v1/prompts

Lists all configured prompts. Each prompt has the following properties:

name: the user-visible name of the prompt
prompt: the system prompt to use
user_prompt: the prompt to use
description: the description of the prompt
is_active: if the prompt is enabled
default: if the prompt is the default, only one prompt should have this set to true
temperature: the temperature to use for the chat completion

page_name: the name of the XWiki page that defines the prompt

Models

GET /wikis/{wikiName}/aiLLM/v1/models

A list of all models that are available for the current user. This API follows the OpenAI models endpoint but provides additional metadata for each model. For every model, the following properties are provided:

id: The reference of the page in the wiki where the model is defined
name: The "nice" name that should be displayed for users
context_length: the supported context length
can_stream: if the model supports streaming requests

An example response with a single model could look like this:

{
  "data": [
    {
      "id": "AI.Models.Llama 3 8B",
      "name": "Llama 3 8B",
      "context_length": 8000,
      "can_stream": true
    }
  ],
  "object": "list",
  "first_id": null,
  "last_id": null,
  "has_more": false
}

Chat Completions

POST /wikis/{wikiName}/aiLLM/v1/chat/completions

This API follows the specification of the OpenAI chat completion API without support for tools or function calls. For streaming requests, returning usage information is always enabled and will be forwarded if the used LLM provider returns it.

LLM Application 0.3+

Chat Request Filters

The LLM Application supports filters to modify both the incoming chat completion request and the response. The filter system that is used for the context is designed to be extensible by extension. The filter system is similar to Wiki Components, though there is only a single real component for every model and filters are stored inside that component. A filter consists of at minimum three parts:

An XClass to store its configuration
A sheet to display and edit that configuration
A component implementing org.xwiki.contrib.llm.ChatRequestFilterBuilder that provides references to the XClass and the sheet and has a method to construct a list of no, one, or several org.xwiki.contrib.llm.ChatRequestFilter objects from a BaseObject (of the XClass of the configuration)

All available filters are always enabled, it is up to the filter to ensure that it has a sensible default state and to provide a way to disable it if that makes sense. When the filter is disabled, the ChatRequestFilterBuilder can return an empty list to not insert any filter into the filter chain.

Filters use a chain of responsibility design pattern to forward a chat completion request between a chain of filters until it reaches the component that is responsible for making the actual request to the LLM. This provides filters a lot of flexibility, filters can:

Modify the incoming chat completion request, e.g., adding context the context filter does.
Decide to stop propagating the request, e.g., based on a rate limit or a content filter, and reply with an error message instead. This could also be used to implement a caching system or to log all requests.
Intercept the response and modify it, e.g., to add additional information or to filter results based on some criteria or to log responses.

Both request and response are available at the same time, allowing to easily pass information between them or also, e.g., log both at the same time.

The priority of the ChatRequestFilterBuilder component decides the order of the filters. Filters with a lower priority are called first. To simplify the implementation of filters, an abstract class org.xwiki.contrib.llm.AbstractChatRequestFilter is provided that implements the forwarding to the next filter. There are two possible request modes that are supported by the LLM Application: streaming and non-streaming. For streaming request, the LLM provides a stream of tokens that are forwarded to the client one by one. In the LLM Application, this is implemented with a callback that is called for every chat completion chunk. If any modification of the result shall be performed, the callback that is forwarded can be replaced by a custom callback. It could also be imagined to cache a number of tokens and only then forward them as whole paragraph if the paragraph passes the content filtering guidelines. It is also possible to add a prefix to the output by calling the callback before forwarding the call in the filter chain. For non-streaming requests, the full result is returned by the filter chain and can thus be examined modified as a whole. In both cases, the request can be modified before forwarding it and in both cases forwarding the request by calling the parent method is optional. The callback for streaming requests may throw an IOException, this happens in particular when the connection is closed by the client because the user cancelled the chat completion request. Both streaming and non-streaming requests are fully synchronous at the moment. This might be changed in particular for streaming requests in case it should turn out to be a problem in the future.

public class MyChatRequestFilter extends AbstractChatRequestFilter
{
    @Override
    public void processStreaming(ChatCompletionRequest request,
        FailableConsumer<ChatCompletionChunk, IOException> consumer) throws IOException, RequestError
    {
        // Modify the request or replace/wrap the consumer if desired.
        super.processStreaming(request, consumer);
    }

    @Override
    public ChatCompletionResult process(ChatCompletionRequest request) throws IOException, RequestError
    {
        // Modify the request or the result if desired.
        return super.process(modifiedRequest);
    }
}

For the sheet, when using editable properties, it is important to set data-object-policy="updateOrCreate" as the XObject won't be present on models that were created before the extension has been installed. The sheet for models takes care of adding the XObject in regular view and edit mode, so in these cases the presence of the XObject can be assumed.

Prerequisites & Installation Instructions

We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager).

You can also use the manual method which involves dropping the JAR file and all its dependencies into the WEB-INF/lib folder and restarting XWiki.

Versions

See LLM

Dependencies

Dependencies for this extension (org.xwiki.contrib.llm:application-ai-llm-models-api 0.8):

org.xwiki.platform:xwiki-platform-rest-api 17.4.4
org.xwiki.platform:xwiki-platform-rest-server 17.4.4
org.xwiki.platform:xwiki-platform-search-solr-query 17.4.4
org.xwiki.platform:xwiki-platform-user-api 17.4.4
org.xwiki.platform:xwiki-platform-component-wiki 17.4.4
com.theokanning.openai-gpt3-java:api 0.18.2

LLM Models API (BETA)

Description

REST API

Prompts

Models

Chat Completions

Chat Request Filters

Prerequisites & Installation Instructions

Versions

Dependencies

Quick Links

My Recent Modifications

About

About

Support

Platform

User Guide

Admin Guide

Developer Guide

Projects

XWiki

Extensions

Other

Contribute

Status

Practices

Under the Hood

Get Involved

Get Connected